Talks

PhD Proposal: Extending the Scope of Provable Adversarial Robustness in Machine Learning

Aounon Kumar

Remote

Wednesday, February 9, 2022, 11:00 am-1:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

The study of certified defenses against adversarial attacks in machine learning has mostly been limited to classification tasks and static one-step adversaries. The goal of this work is to expand the set of provable robustness techniques to cover more general real-world settings such as models with structured outputs (e.g., images, sets and segmentation masks) and adaptive multi-step adversaries (e.g., in reinforcement learning). Generating robust systems with provable guarantees in these settings is often challenging as existing methods, developed for classification models, cannot be easily applied. For instance, structured outputs like images and segmentation masks cannot be treated as elements of a discrete set of classes in a meaningful way.

First, we develop a randomized smoothing based algorithm to produce verifiably robust models for problems with structured outputs. Many machine learning problems like image segmentation, object detection, image/audio-to-text systems, etc., fall under this category. Our procedure works by evaluating the base model on a collection of noisy versions of the input point and aggregating the predictions by computing the center of the smallest ball that covers at least half of the output points. It can produce robustness certificates under a wide range of similarity (or distance) metrics in the output space such as perceptual distance, intersection over union and cosine distance. These certificates guarantee that the change in the output as measured by the distance metric remains bounded for an adversarial perturbation of the input.

Next, we study some limitations of randomized smoothing when used to defend against Lp-norm bounded adversaries for p > 2, especially for p = infinity. We show that this technique suffers from the curse of dimensionality when the smoothing distribution is independent and identical in each input dimension. The size of the certificates decreases with an increase in the dimensionality of the input space. Thus, for high-dimensional inputs such as images, randomized smoothing does not yield meaningful certificates against an L-infinity norm bounded adversary.

We then present certifiable robustness in the setting of reinforcement learning where the adversary is allowed to track the states, actions and observations generated in previous time-steps and adapt its attack. We prove robustness guarantees for an agent following a Gaussian-smoothed policy. The goal here is to certify that the expected total reward obtained by the robust policy remains above a certain threshold under a norm-bounded adaptive adversary. Our main theoretical contribution is to prove an adaptive version of the Neyman-Pearson Lemma – a key lemma for smoothing-based certificates – where the adversarial perturbation at a particular time-step is allowed to be a stochastic function of previous observations, states and actions. Our approach differs from existing techniques as it can generate certificates for an entire episode instead of certifying predictions at individual time-steps.

We also design a method to certify confidence scores for neural network predictions under adversarial perturbations of the input. Conventional classification networks with a softmax layer output a confidence score that can be interpreted as the degree of certainty the network has about the class label. In applications like credit scoring and disease diagnosis systems where reliability is key, it is important to know how sure a model is about its predictions so that a human expert can take over if the model’s confidence is low. Our procedure uses the distribution of the confidence scores under randomized smoothing to generate stronger certificates than a naive approach that ignores the distributional information. Finally, we discuss some potential future directions of research where we plan to further extend provable robustness to more settings such as Wasserstein shifts of the data distribution and time-series inputs.
Examining Committee:

Chair:
Department Representative:

Dr. Soheil Feizi
Dr. John P. Dickerson
Dr. Tom Goldstein

Bio

Aounon Kumar is a PhD student in the Computer Science department. His research focuses on designing machine learning algorithms with verifiable guarantees on robustness. In the past, he has also worked on problems in theoretical computer science such as approximation algorithms and hardness results for combinatorial optimization problems.

This talk is organized by Tom Hurst