With the emergence of large datasets and deep learning in recent years, learning based methods play a more and more important role in computer vision, and deep Convolutional Neural Networks (CNNs) now dominate most of the problems in computer vision. Despite their success, deep CNNs are notorious for their data hungry nature compared with traditional learning based methods. Consider image classification as an example; to train a deep CNN that performs reasonably well, millions of labeled data need to be fed into the deep network. While collecting images from the internet is easy and fast, labeling those images is both time consuming and expensive, and sometimes, even impossible. In this proposal, we study how to alleviate the demands of massive labeled data in training deep CNNs.
First, we study the capacity of the deep CNNs. Designing deep CNNs with less capacity and good generalization is one way to reduce the number of labeled data needed in training deep CNNs, and understanding the capacity of deep CNNs is the first step towards that goal. In this work, we empirically study the capacity of deep CNNs by studying the redundancy of parameters in deep CNNs. More specifically, we aim at optimizing the number of neurons in a network, thus the number of parameters. To achieve that goal, we incorporate sparse constraints into the objective function. A forward-back splitting method is applied to solve this sparse constrained optimization problem efficiently. We also investigate the importance of rectified linear units (ReLU) in sparse constrained CNNs, showing that using ReLU can lead to more pruned neurons. We studied two sparse constraints: tensor low rank and group sparsity and carried out experiments on four well-known models (LeNet , CIFAR-10 quick, AlexNet and VGG ) using three public datasets including ImageNet. Our experiments demonstrate that the proposed method can significantly reduce the number of parameters during the training stage, showing that a network with small capacity can work well.
Second, we explore how to use synthetic data to train a deep CNN when there are no ground truth labels for real data. We studied an important problem in computer vision: inverse lighting from a single face image. Lacking massive ground truth lighting labels, we generate a large number of synthetic data with ground truth lighting to train a deep network. However, due to the large domain gap between real and synthetic data, the network trained using synthetic data cannot generalize well to real data. We thus propose to use real data to train the deep CNN together with synthetic data. We apply an existing method to estimate lighting conditions of real face images. However, these lighting labels are noisy. We then proposed a Label Denoising Adversarial Network (LDAN) to make use of these synthetic data to help train a deep CNN to regress lighting from real face images, denoising labels of real images. We have shown that the proposed method could generate more consistent lighting for faces taken under the same lighting condition.
Finally, we will discuss our future work about how to alleviate the massive demands of labeled data in training deep CNNs in other applications.
Dean's rep: Dr. Thomas Goldstein
Members: Dr. Larry S. Davis