log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Protecting Privacy & Guaranteeing Generalization by Controlling Information
Thomas Steinke - IBM Almaden
Wednesday, April 4, 2018, 10:00-11:00 am Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

As data is being more widely collected and used, privacy and statistical validity are becoming increasingly difficult to protect. Sound solutions are needed, as ad hoc approaches have resulted in several high-profile failures.

In this talk, I will illustrate how privacy can be unwittingly compromised -- i.e., sensitive information can be leaked by seemingly innocuous "anonymized" or aggregate data. I will then show how differential privacy avoids these pitfalls. Differential privacy is an information-theoretic notion of algorithmic stability that provides a framework for measuring the leakage of private information and, most importantly, how this information accumulates over multiple uses of an individual's data. This allows us to design algorithms to perform sophisticated statistical analyses, while providing robust privacy guarantees.

Privacy turns out to be intimately related to generalization in machine learning. In particular, a differentially private algorithm is guaranteed to not "overfit" its data, meaning that any statistical conclusions extend to the underlying distribution from which the data was drawn. I will discuss this connection and explain how it is especially useful for adaptive data analysis, namely when one dataset is used over and over again and each successive analysis is informed by the outcome of previous analyses.

Bio

Thomas Steinke is a postdoctoral researcher at the IBM Almaden Research Center in San Jose, California. In 2016, he graduated from Harvard University with a PhD in Computer Science advised by Salil Vadhan and prior to that he completed a MSc and a BSc(Hons) at the University of Canterbury in New Zealand. His research interests include providing rigorous tools for privacy-preserving data analysis and statistically valid adaptive data analysis, as well as pseudorandomness.

This talk is organized by Jonathan Katz