Talks

PhD Defense: Closing the Gap Between Classification and Retrieval Models

Ahmed Taha

Remote

Thursday, March 25, 2021, 9:00-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Retrieval networks learn a feature embedding where similar samples are close together, and different samples are far apart. This feature embedding is essential for computer vision applications like face/person recognition, zero-shot learning, and image retrieval. Despite these important applications, retrieval networks are less popular compared to classification networks due to multiple reasons: (1) The cross-entropy loss -- used with classification networks -- is stabler and converges faster compared to metric learning losses -- used with retrieval networks. (2) The cross-entropy loss has a huge toolbox of utilities and extensions. For instance, both AdaCos and self-knowledge distillation have been proposed to tackle low sample complexity in classification networks; also, both CAM and Grad-CAM have been proposed to visualize attention in classification networks. To promote retrieval networks, it is important to equip them with an equally powerful toolbox. Accordingly, we propose an evolution-inspired approach to tackle low sample complexity in feature embedding. Then, we propose SVMax to regularize the feature embedding and avoid model collapse. Furthermore, we propose L2-CAF to visualize attention in retrieval networks.

To tackle low sample complexity, we propose an evolution-inspired training approach to boost performance on relatively small datasets. The knowledge evolution (KE) approach splits a deep network into two hypotheses: the fit-hypothesis and the reset-hypothesis. We iteratively evolve the knowledge inside the fit-hypothesis by perturbing the reset-hypothesis for multiple generations. This approach not only boosts performance but also learns a slim (pruned) network with a smaller inference cost. KE reduces both overfitting and the burden for data collection.

To regularize the feature embedding and avoid model collapse, We propose singular value maximization (SVMax) to promote a uniform feature embedding. Our formulation mitigates model collapse and enables larger learning rates. SVMax is oblivious to both the input-class (labels) and the sampling strategy. Thus it promotes a uniform feature embedding in both supervised and unsupervised learning. Furthermore, we present a mathematical analysis of the mean singular value's lower and upper bounds. This analysis makes tuning the SVMax's balancing-hyperparameter easier when the feature embedding is normalized to the unit circle.

To support retrieval networks with a visualization tool, we formulate attention visualization as a constrained optimization problem. We leverage the unit L2-Norm constraint as an attention filter (L2-CAF) to localize attention in both classification and retrieval networks. This approach imposes no constraints on the network architecture besides having a convolution layer. The input can be a regular image or a pre-extracted convolutional feature. The network output can be logits trained with cross-entropy or a space embedding trained with a ranking loss. Furthermore, this approach neither changes the original network weights nor requires fine-tuning. Thus, network performance remains intact. The visualization filter is applied only when an attention map is required. Thus, it poses no computational overhead during inference. L2-CAF visualizes the attention of the last convolutional layer of GoogLeNet within 0.3 seconds.

Examining Committee:

Chair: Dr. Larry Davis
Dean's rep: Dr. Behtash Babadi
Members: Dr. Abhinav Shrivastava

    Dr. David Jacobs
   Dr. Ramani Duraiswami
    Dr. Tom Goldstein

Bio

Ahmed Taha is a CS Ph.D. student at the University of Maryland, College Park, under the supervision of Prof. Larry S. Davis and Abhinav Shrivastava. His research focuses on deep metric learning, representation learning, and visualization.

This talk is organized by Tom Hurst