- You are subscribed to this talk through .
- You are watching this talk through .
- You are subscribed to this talk. (unsubscribe, watch)
- You are watching this talk. (unwatch, subscribe)
- You are not subscribed to this talk. (watch, subscribe)
In this talk, we delve into the world of generative AI, focusing on reward-directed generation through conditional diffusion models—a powerful technique with wide applications in generative AI and transformative potential in optimization and decision making. We address a common learning scenario, where a dataset contains both unlabeled data and a smaller set with noisy reward labels. Our innovative approach employs a learned reward function on the smaller dataset as a pseudolabeler, allowing us to effectively generate samples conditioned on desired rewards while uncovering latent data representations. Theoretical insights highlight the model's ability to sample from the reward-conditioned data distribution and steer generated populations toward user-specified target rewards, aligning optimality gaps with off-policy bandit regret. We emphasize the interplay between reward signal strength, distribution shift, and off-support extrapolation costs in achieving near-optimal generative samples and provide empirical results including image generation, reinforcement learning, and control.
Mengdi Wang is an associate professor at the Department of Electrical and Computer Engineering and Center for Statistics and Machine Learning at Princeton University. She is also affiliated with the Department of Computer Science, Princeton’s ML Theory Group. She was a visiting research scientist at DeepMind, IAS and Simons Institute on Theoretical Computer Science. Her research focuses on machine learning, reinforcement learning, with applications in healthcare, drug discovery, inteligent systems. Mengdi received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Before that, she got her bachelor degree from the Department of Automation, Tsinghua University. Mengdi received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018, WAIC YunFan Award 2022. She serves as a Program Chair for ICLR 2023 and Senior AC for Neurips 2023, associate editor for Operations Research and Mathematics of Operations Research, area chair for ICML, AISTATS, and is on the editorial board of Journal of Machine Learning Research. Research supported by NSF, AFOSR, NIH, ONR, Google, Microsoft C3.ai DTI, FinUP, RVAC Medicines.
Note: For future talks, you can register using the Google Form on our website https://go.umd.edu/marl