Code Large Language Models (LLMs) like GitHub Copilot, Amazon CodeWhisperer, and Code Llama have revolutionized software development, boosting productivity for millions of developers. By 2028, it is estimated that 75% of enterprise software engineers will rely on AI Coding Assistants. However, Code LLMs raise significant security concerns, with studies indicating that 40% of programs generated by GitHub Copilot are vulnerable. Thus, there is an urgent need to ensure the safety of Code LLMs.
In this lecture, I will begin by reviewing the traditional methods used to evaluate the functional correctness and security of code produced by LLMs, pointing out the inherent limitations of these approaches. I will then introduce a new benchmark and new metrics we've developed to more accurately assess the correctness and security of Code LLMs. I will delve into how Code LLMs generate code and our research on the security of the state-of-the-art models. Finally, I will discuss future research directions and challenges of how to generate secure code.
Yizheng Chen is an Assistant Professor of Computer Science at the University of Maryland. She works at the intersection of AI and security. Her research focuses on Code Large Language Models and AI Security. Previously, she received her Ph.D. in Computer Science from the Georgia Institute of Technology, and was a postdoc at University of California, Berkeley and Columbia University. Her work has received an ACM CCS Best Paper Award Runner-up, a Google ASPIRE Award, and Top 10 Finalist of the CSAW Applied Research Competition. She is a recipient of the Anita Borg Memorial Scholarship. Her recent work has been adopted by Google DeepMind, Amazon CodeWhisperer, and OpenAI.