log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
PhD Defense: Deep Video Analytics of Humans: from Action Recognition to Forgery Detection
Steven Schwarcz
Thursday, August 5, 2021, 1:00-3:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract
In this work, we explore a variety of techniques and applications for addressing visual problems involving videos of humans in the contexts of activity detection, pose detection, and forgery detection.

The first works discussed here address the issue of human activity detection in untrimmed video where the actions performed are spatially and temporally sparse. The video may therefore contain long sequences of frames where no actions occur, and the actions that do occur will often only comprise a very small percentage of the pixels on the screen. We address this with a 2-stage architecture that first suggests many coarse proposals with high recall, and then classifies and refines proposals to create temporally accurate activity proposals. We present two methods that follow this high-level paradigm: TRI-3D and CHUNK-3D.

This work on activity detection is then extended to include results on few-shot learning. In this domain, a system must learn to perform detection given only an extremely limited set of training examples. We propose a method we call a Self-Denoising Neural Network (SDNN) which takes inspiration from Denoising Autoencoders in order to solve this problem, both in the context of activity detection and image classification. We also propose a method that performs optical character recognition on real world images when no labels are available in the language we wish to transcribe. Specifically, we build an accurate transcription system for Hebrew street name signs when no labeled training data is available.

We continue our analysis by proposing a method for automatic detection of facial forgeries in videos and images. This work approaches the problem of facial forgery detection by breaking the face into multiple regions and training separate classifiers for each part. The end result is a collection of high-quality facial forgery detectors that are both accurate and explainable. We exploit this explainability by providing extensive empirical analysis of our method’s results.

Finally, we present work that focuses on multi-camera, multi-person 3D human pose estimation from video. To address this problem, we aggregate the outputs of a 2D human pose detector across cameras and actors using a novel factor graph formulation, which we optimize using the loopy belief propagation algorithm.

Examining Committee: 
 
                           Chair:              Dr. Rama Chellappa                        
                          Dean's rep:      Dr. David Jacobs 
                          Members:        Dr.  Christopher Metzler 
                                               Dr. Shuvra Bhattacharyya 
                                              Dr. Abhinav Shrivastava   
                                        
Bio

Steven Schwarcz is a PhD student in the Computer Science Department at University of Maryland College Park advised by Dr. Rama Chellappa. His research interests include Computer Vision and Machine Learning, with a focus on video analysis of humans. In late August, he will begin a position as an Applied Scientist at Amazon.

This talk is organized by Tom Hurst