Talks

PhD Proposal: Improving Model and Data Efficiency for Deep Learning

Renkun Ni

5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Thursday, April 28, 2022, 9:00-11:00 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Deep learning has achieved or even surpassed human-level performance in a wide range of challenging computer vision tasks. However, such success is usually obtained by training huge models on numerous labeled examples, which requires considerable computation resources and expensive data collection costs. A number of works have been proposed to improve the efficiency in both aspects. On one hand, significant progress has been made to accelerate the inference by methods such as quantization and pruning. On the other hand, few-shot learning and self-supervised learning have gathered more attention due to their abilities to learn feature representations with few labeled examples or even without human supervision. In this work, we propose several improvements and further analysis of these techniques.

To improve the model efficiency, we investigate the problem caused by integer overflows when low-resolution arithmetic is applied in existing quantization methods. We find that integer overflow happens frequently, and even a small amount of integer overflows will destroy the performance of the trained deep neural networks, which limits the further acceleration. To allow networks to perform reasonably with low-resolution arithmetic, we propose WrapNet, an architecture that mimics the "wrap-around" property of integer overflow, and achieves comparable performance with extremely low-resolution accumulators (8-bit).

On the side of label efficiency, we develop a better understanding of the models trained by meta-learning, which has a unique training pipeline, for few-shot classification tasks. In addition, due to the uniqueness of the training pipeline, we find that those data augmentation methods which are helpful in regular supervised learning may actually hurt the few-shot performance of meta-learners. We conduct a comprehensive analysis on how to incorporate data augmentation strategies into the meta-learning pipeline and proposed Meta-MaxUp, a data augmentation method for meta-learning that improves few-shot performance on multiple benchmarks.

Besides few-shot learning, we explore the possibilities of applying meta-learning methods to self-supervised learning. We discuss the close relationship between meta-learning and contrastive learning, a method that achieves excellent results in self-supervised learning, under a certain task distribution. We show that we can achieve comparable linear probe performance with meta-learning methods and better transferability when compared to contrastive learning. In addition, task-based data augmentations related to meta-learning can be applied to improve the self- supervised performance of contrastive learning as well.

For future works, we will focus on developing other techniques to improve model efficiency especially when we have high-resolution input. In addition, we will attempt to improve the training efficiency of neural networks.

Examining Committee:

Chair:
Department Representative:

Dr. Tom Goldstein
Dr. John Dickerson
Dr. Furong Huang

Bio

Renkun Ni is a PhD student in the Department of Computer Science at the University of Maryland, College Park, advised by Professor Tom Goldstein. His researches are focused on efficient deep learning, few-shot learning and self-supervised learning.

This talk is organized by Tom Hurst