Modern NLP is dominated by scale---today's language models (LMs) use supermassive parameter counts, dataset sizes, and compute budgets. In this talk, I will show that large LMs "memorize” their training data in various settings. This can sometimes be beneficial, e.g., memorization allows models to learn and recall knowledge from their pre-training data when solving downstream tasks. On the other hand, memorization can lead to legal concerns (e.g., generating copyright data, outputting medical documents) and an over-reliance on memorization can lead to failures in reasoning for novel tasks and inputs. Throughout the talk, I will give a particular focus on actionable insights that we can derive from these analyses, especially with respect to training strategies, model architectures, and dataset design.
Papers discussed: https://arxiv.org/abs/2012.078
Eric Wallace is a 4th year PhD student at UC Berkeley advised by Dan Klein and Dawn Song. His research interests are in making large language models more robust, trustworthy, secure, and private. Eric's work is supported by the Apple Fellowship in AI/ML, and in the past he has been at FAIR, AI2, and the University of Maryland.