Talks

PhD Defense: Feedback for Vision

Michael Maynord

IRB-4105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, May 1, 2024, 3:00-5:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

This thesis explores the application of feedback in image and action understanding, as well as video monitoring. It introduces Mid-Vision Feedback (MVF), a mechanism that modulates perception based on high-level categorical expectations, enhancing accuracy and contextual consistency in object classification. This approach is extended to action understanding through Sub-Action Modulation (SAM), which incorporates context into action interpretation by hierarchically grouping action primitives. SAM demonstrates superior performance over various video understanding architectures, improving action recognition and anticipation accuracies. Additionally, a configurable perception pipeline architecture, the Image Surveillance Assistant (ISA), is presented to aid watchstanders in video surveillance tasks by integrating human-specified expectations into the perceptual loop. Lastly, taking inspiration from contextual contrasting in MVF, a learning formulation for motion and context separation is proposed, showing improvements in action recognition and anticipation accuracies across multiple datasets.

Examining Committee

Chair:	Dr. John Aloimonos
Dean's Representative:	Dr. Shihab Shamma
Members:	Dr. Cornelia Fermüller
	Dr. Dinesh Manocha
	Dr. Ramani Duraiswami

Bio

Michael Maynord, a PhD candidate studying AI and Computer Vision, has conducted research across classical symbolic methods, cognitive architectures, and deep learning. Motivated by the integration of high-level reasoning and perception, this research has included the exploration of the connection between knowledge and visual perception. This extends to image understanding and action understanding in video, with a dedicated investigation into feedback mechanisms in vision to synchronize perception with higher-level world understanding. Recent projects include contributions to medical image analysis, particularly focusing on detecting lung nodules and discerning manifestations of multiple sclerosis.

This talk is organized by Migo Gui