Talks

Rich Representations for Detailed Visual Recognition

Subhransu Maji - TTI Chicago

Thursday, April 4, 2013, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

As humans, we have the remarkable ability to perceive the world around us in minute detail -- we can estimate material and metric properties of objects, localize people in images, describe what they are doing, and even identify them! Despite many successes of computer vision over the past two decades, methods for reliably inferring such fine-grained properties from images are lacking. I'll describe our attempts in developing computational models for such detailed recognition by improving the interplay of data, computation and humans in three ways.

First, I'll present computationally efficient classifiers for visual recognition which are a key ingredient of many recognition systems. These approximate non-linear kernel SVM classifiers that are widely used in computer vision, while being exponentially faster during training and testing, making them practical for large-scale recognition or detection tasks. Second, I'll show how humans can enable better and interpretable models for detailed recognition. Visual categories are decomposed using novel parts called "poselets" that are semantically aligned to human annotations. These provide a basis for high-level recognition and lead to simple, accurate and interpretable architectures for learning and recognition. The proposed models rely on annotations of landmarks and attributes during learning. However, deciding on the right set of landmarks, or attributes, to annotate can be a challenging task. In the third part of the talk, I'll present a relative annotation framework that overcomes some of the shortcomings of traditional annotation methods, while enabling discovery of rich semantic structure within the category when the set of the annotation labels are not known ahead of time. I'll present experiments on semantic part and attribute discovery for visually diverse categories such as buildings, airplanes and birds.

Bio

Subhransu Maji received the BTech degree in computer science and engineering from the Indian Institute of Technology, Kanpur, in 2006, and the PhD degree in computer science from the University of California, at Berkeley, in 2011. He is currently a research assistant professor at TTI Chicago. Earlier he was an intern in Google's image search group and INRIA's LEAR group, and a visiting researcher at Microsoft Research India and the CLSP center at Johns Hopkins University. He received the medal for the best graduating student in the computer science department from IIT Kanpur. He was one of the recipients of the Google graduate fellowship in 2008 and a best paper award at ICIF 2009. His primary interests are in computer vision and machine learning, with focus on representations and efficient algorithms for rich visual recognition.

This talk is organized by Adelaide Findlay