Talks

PhD Proposal: Closing the Gap Between Classification and Retrieval Models

Ahmed Taha

Virtual

Thursday, April 30, 2020, 9:00-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

A classification architecture with an embedding head enables searching and indexing. The embedding head, trained with a ranking loss, limits the overfitting capabilities of the softmax loss through promoting a smooth embedding space. Towards this goal, we propose a two-head architecture that is trained with both softmax and ranking losses. The two-head architecture is simple yet effective in boosting both the classification and feature embedding. Related literature (e.g, center loss) proposes a similar formulation but they always assumed a fixed number of class modes. In our work, we alleviate this constraint using semi-hard triplet loss. This loss allows a dynamic number of modes per class, which is vital when working with imbalanced data. Also, we refute a common assumption that training with a ranking loss is computationally expensive. By moving both the triplet loss sampling and computation to the GPU, the training time increases by just 2%.

Due to classification architectures' popularity, recent literature supports them with a toolbox of utilities. For instance, class activation maps (CAM) can visualize a classification network's attention. For models trained with softmax loss, it is possible to model uncertainties -- for both network and data. These visualization and uncertainty tools are essential to promote these architectures into safety-critical domains. To promote retrieval models' usability, we propose a Bayesian triplet loss to model both network and data uncertainty. In addition, we propose a generic tool to visualize attention for various classification and retrieval networks.

Compared to classification networks, uncertainty estimation for retrieval networks is hardly studied. We propose a Bayesian triplet loss to model retrieval networks' uncertainties. Inspired by uncertainty formulation in regression tasks, we cast triplet loss as a tri-variable regression function. Through this casting, and using dropout layers, we model the network uncertainty, i.e., epistemic uncertainty. We further reformulate the vanilla triplet loss to model data uncertainty (heteroscedastic uncertainty). The new formulation provides an uncertainty conditioned on the input image. This is achieved with minimal computational cost and without uncertainty annotation.

Finally, we illustrate how Bayesian triplet loss boosts retrieval networks' robustness when trained with noisy data. Finally, we leverage the L2-Norm as a visualization tool to localize attention in both classification and retrieval networks. This approach imposes no constraints on the network architecture besides having a convolution layer. The input can be a regular image or a pre-extracted convolutional feature. The network output can be logits trained with softmax or a space embedding trained with a ranking loss. Furthermore, this approach neither changes the original network weights nor requires fine-tuning. Thus, network performance remains intact. The visualization filter is applied only when an attention map is required. Thus, it poses no computational overhead during inference. It can visualize the attention of intermediate layers. L2-Norm visualizes the attention of the last convolutional layer of GoogLeNet within 0.3 seconds.

Examining Committee:

Chair: Dr. Larry Davis
Dept rep: Dr. Tom Goldstein
Members: Dr. Abhinav Shrivastava
Dr. David Jacobs

Bio

Ahmed Taha is a fifth-year Ph.D. student in CS at the University of Maryland, College Park, under the supervision of Prof. Larry S. Davis. His research focuses on deep metric learning, including feature embedding, uncertainty, and visualization.

This talk is organized by Tom Hurst