Talks

PhD Proposal: On Provable Robustness against Data Poisoning

Wenxiao Wang

IRB-4105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Friday, August 9, 2024, 10:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

https://umd.zoom.us/j/2766083717

In data poisoning, attackers maliciously manipulate training data to influence the behavior of learning algorithms. In this talk, I will present the progress my coauthors and I have made in provably mitigating the threat of data poisoning. First, I will introduce aggregation-based certified defenses against general data poisoning. Next, I will discuss the Lethal Dose Conjecture, a theoretical framework that targets the fundamentals of robustness against data poisoning. This conjecture connects optimal robustness to few-shot learning with clean distributions, and I will provide theoretical results that verify the conjecture in multiple cases. I will also explain the significance of this conjecture: If it holds true for a given task, aggregation-based defenses will be asymptotically optimal. Essentially, if we have the most data-efficient learner, we can transform it into one of the most robust defenses against data poisoning, thereby reducing the challenge of data poisoning defense to few-shot learning. Considering that defending against general data poisoning can be theoretically very difficult, where should we go from here? I will present an idea that employs temporal concepts to measure attack budgets, leading to novel threat models of data poisoning that are applicable in practical scenarios where traditional threat models fall short.

Bio

Wenxiao Wang is currently a PhD student in Computer Science at the University of Maryland, under the supervision of Prof. Soheil Feizi. His research interests include, but are not limited to, machine learning robustness, privacy-preserving machine learning, and self-supervised representation learning. Previously, he worked on mitigating data poisoning.

This talk is organized by Migo Gui