Talks

Analyzing Programs in the Era of Software 2.0

Xin Zhang

IRB 4105

Thursday, February 27, 2020, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

With the software industry experiencing a major shift to machine learning, the programming systems community is facing both opportunities and challenges. On one hand, advances in machine learning provide new toolkits to build better programming systems to ensure software quality. On the other hand, as machine learning programs are increasingly being used in critical applications, it is now paramount to ensure their quality as well. In this talk, I will describe a set of new analysis techniques that address these opportunities and challenges.

First, I will talk about a data-driven framework for improving program analyses. It enables both online and offline learning by incorporating probabilities in the representation, which is conventionally only logical. While the logical part still encodes the expert knowledge from the analysis designer and ensures correctness, the probabilistic part now offers new abilities to handle uncertainties. Our approach reduces the number of false positives by 70% for foundational program analyses like datarace detection and pointer analysis. In addition, our inference engine can solve problems containing up to 10^30 clauses from various domains including program analysis, statistical AI, and Big Data analytics.

While existing program analyses work well with conventional programs, they cannot be applied to analyzing novel properties that arise in machine learning. To address this challenge, we have developed program analyses for emerging properties such as interpretability and fairness. Our interpretability analysis is the first that uses corrections as actionable feedback to judgments made by a neural network. And our fairness analysis can scale to models that are more than five orders of magnitude larger than the largest previously-verified model. To enable building machine learning programs that satisfy these properties by construction, we have also developed a probabilistic programming language that supports distributional inference and causal inference.

Bio

Xin Zhang is a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory at Massachusetts Institute of Technology. His research areas are programming languages and software engineering, with a focus on the interplay between programming systems and machine learning. On one hand, he leverages machine learning ideas to improve the usability of programming systems. On the other hand, he develops new analyses and languages to ensure the quality of machine learning programs. His work has received Distinguished Paper Awards from PLDI'14 and FSE'15. Xin received his Ph.D. from Georgia Tech in 2017 which was partly supported by a Facebook Fellowship.

This talk is organized by Richa Mathur