Talks

Fine-tuning Text-to-Image Models: Reinforcement Learning and Confidence-aware Reward Optimization

Mohammad Ghavamzadeh - Amazon AWS AI Labs

IRB-5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Tuesday, February 27, 2024, 6:30-7:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Registration requested: The organizer of this talk requests that you register if you are planning to attend. There are two ways to register: (1) You can create an account on this site (click the "register" link in the upper-right corner) and then register for this talk; or (2) You can enter your details below and click the "Register for talk" button. Either way, you can always cancel your registration later.

Abstract

We present three recent results in fine-tuning text-to-image diffusion models.

(i) In the first work, we train a reward function using a human-labeled text-image dataset and fine-tune the model by maximizing reward-weighted likelihood to improve text-image alignment. We also analyze several design choices and show that their careful investigation is crucial when balancing the alignment-fidelity tradeoff.

(ii) In the second work published at NeurIPS-2023, we propose using online reinforcement learning (RL) to fine-tune text-to-image diffusion models. We first formulate the fine-tuning task as an RL problem, and then show how to update the pre-trained model using policy gradient. Our approach integrates policy optimization with KL regularization. We evaluate our method in terms of both text-image alignment and image quality.

(iii) Finally, in the last work currently under review at ICLR-2024, we take on the problem of reward over-optimization in text-to-image models. We first introduce the Text-Image Alignment Assessment (TIA2) benchmark, a diverse collection of text prompts, images, and human annotations, for studying this issue. We evaluate several state-of-the-art reward models for text-to-image generation on our benchmark and find that they are often not well-aligned with human assessment. We introduce a simple method, TextNorm, for inducing confidence calibration in reward models by normalizing the scores across prompts that are semantically different from the original prompt, and demonstrate that using the confidence-calibrated scores in fine-tuning can effectively reduce the risk of over-optimization.

Bio

Mohammad Ghavamzadeh received a Ph.D. degree from UMass Amherst in 2005. He was a postdoctoral fellow at UAlberta from 2005 to 2008. He was a permanent researcher at INRIA, France from 2008 to 2013. He was the recipient of the “INRIA award for scientific excellence” in 2011, and obtained his Habilitation in 2014.

Since 2013, he has been a research scientist at Adobe, Facebook AI Research (FAIR), Google Research, and now at Amazon AWS AI Labs. He has published over 120 refereed papers in major machine learning, AI, and control journals and conferences. He has co-chaired more than 10 workshops and tutorials at NeurIPS, ICML, and AAAI. His research has been mainly focused on the areas of reinforcement learning, bandit algorithms, and recommendation systems. Over the last two years, he has also been working on the problem of alignment in generative AI models.

Note: Please register using the Google Form on our website https://go.umd.edu/marl for access to the Google Meet, Open-source Multi-Agent AI Research Community and talk resources.

This talk is organized by Saptarashmi Bandyopadhyay