log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Feedback for Vision
Michael Maynord
Wednesday, May 1, 2024, 3:00-5:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract
This thesis explores the application of feedback in image and action understanding, as well as video monitoring. It introduces Mid-Vision Feedback (MVF), a mechanism that modulates perception based on high-level categorical expectations, enhancing accuracy and contextual consistency in object classification. This approach is extended to action understanding through Sub-Action Modulation (SAM), which incorporates context into action interpretation by hierarchically grouping action primitives. SAM demonstrates superior performance over various video understanding architectures, improving action recognition and anticipation accuracies. Additionally, a configurable perception pipeline architecture, the Image Surveillance Assistant (ISA), is presented to aid watchstanders in video surveillance tasks by integrating human-specified expectations into the perceptual loop. Lastly, taking inspiration from contextual contrasting in MVF, a learning formulation for motion and context separation is proposed, showing improvements in action recognition and anticipation accuracies across multiple datasets.
 
Examining Committee

Chair:

Dr. John Aloimonos

Dean's Representative:

Dr. Shihab Shamma

Members:

Dr. Cornelia Fermüller

 

Dr. Dinesh Manocha

 

Dr. Ramani Duraiswami

Bio

Michael Maynord, a PhD candidate studying AI and Computer Vision, has conducted research across classical symbolic methods, cognitive architectures, and deep learning. Motivated by the integration of high-level reasoning and perception, this research has included the exploration of the connection between knowledge and visual perception. This extends to image understanding and action understanding in video, with a dedicated investigation into feedback mechanisms in vision to synchronize perception with higher-level world understanding. Recent projects include contributions to medical image analysis, particularly focusing on detecting lung nodules and discerning manifestations of multiple sclerosis.

This talk is organized by Migo Gui