Neural networks used in practice have millions of parameters and yet they generalize well even when they are trained on small datasets. While there exists networks with zero training error and a large test error, the optimization algorithms used in practice magically find the networks that generalizes well to test data. How can we characterize such networks? What are the properties of networks that generalize well? How do these properties ensure generalization?
In this talk, we will develop techniques to understand generalization in neural networks. Towards the end, I will show how this understanding can help us design architectures and optimization algorithms with better generalization performance.
Behnam Neyshabur is a postdoctoral researcher in Yann LeCun’s group at New York University. Before that, he was a member of Theoretical Machine Learning program lead by Sanjeev Arora at the Institute for Advanced Study (IAS) in Princeton. In summer 2017, he received a PhD in computer science at TTI-Chicago where Nati Srebro was his advisor. He is interested in machine learning and optimization and his primary research is on optimization and generalization in deep learning.