Talks

Compositional and Robust Action Understanding

Huijuan Xu

https://umd.zoom.us/j/94543765116?pwd=clY3MVV5Z1g4T2xpdnJMdjFiMFhYdz09

Tuesday, March 30, 2021, 1:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

In an era with massive video data becoming available from a wide range of applications (e.g. smart home devices, medical instruments, intelligent transportation networks, etc), designing algorithms which understand action can enable machines to interact meaningfully with human partners. Practically, continuous video streams require temporal localization of actions before a trimmed action recognition method can be applied, yet such annotation is expensive and suffers from annotation consistency issues. Also, early video understanding technologies mostly use holistic frame modeling and do not employ reasoning capabilities. In this talk, I will discuss how to detect action in continuous video streams efficiently. Specifically, I will talk about several temporal action detection models with different levels of supervision. Next, I will introduce how to understand action compositionally with localized foreground subjects or objects to reduce the effect of confounding variables, and bridge a connection with common knowledge of involved objects. Additionally, natural language provides an efficient and intuitive way to convey details of action to a human. I will conclude the talk with some perspectives on how compositional and efficient modeling opens the door for real-word action understanding with high complexity and fine granularity.

Bio

Huijuan Xu is a postdoctoral scholar in the EECS department at UC Berkeley advised by Prof. Trevor Darrell. Her research focuses on deep learning, computer vision and natural language processing, particularly in the area of action understanding in videos. Specifically, she has investigated efficient action detection, compositional action understanding and action description using language. Her R-C3D work received the Most Innovative Award in ActivityNet Challenge. Prior to UC Berkeley, she received her PhD degree from the computer science department at Boston University in 2018, and interned at Disney Research, Pittsburgh. She is selected as the Rising Star 2020 in EECS.

This talk is organized by Richa Mathur