Context has been one of the most important aspects in computer vision researches because it provides useful guidance to solve variant tasks in both spatial and temporal domain. As the recent rise of deep learning methods, deep networks have shown impressive performances on many computer vision tasks. Model deep context explicitly and implicitly in deep networks can further boost the effectiveness and efficiency of deep models.
In spatial domain, implicitly model context can be useful to learn discriminative texture representations. We propose an effective deep fusion architecture to capture both the second order and first older statistics of texture features; Meanwhile, explicitly model context can also be important to challenging task such as fine-grained classification. We then propose a deep multi-task network that explicitly captures geometry constraints by simultaneously conducting fine-grained classification and key-point localization.
In temporal domain, explicitly model context can be crucial to activity localization. We propose a temporal context network to explicitly capture relative context around a proposal, which samples two temporal scales pair-wisely for precise temporal localization of human activities; Meanwhile, implicitly model context can lead to better network architecture for video applications. We then propose a temporal aggregation network that learns a deep hierarchical representation for capturing spatio-temporal information.
For future research directions, we will discuss the deep architecture for jointly learning spatial and temporal context and explore the possibility of solving both applications using one unify deep model.
Dean's rep: Dr. Hector Corrada Bravo
Members: Dr. Rama Chellappa