Talks

PhD Proposal: Tailoring Reinforcement Learning Approaches for Socially Consequential Domains

Aviva Prins

IRB-4107 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, March 26, 2025, 9:00 am-11:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Reinforcement learning (RL) is a powerful mathematical framework for decision making under uncertainty. Unfortunately, many RL solutions, despite being state-of-the-art, would be inappropriate to utilize in practice. This is because key application-motivated desiderata, such as the equitable distribution of resources, are missing from the initial algorithmic design step of the process. In this thesis, we will discuss this phenomenon in three settings: bandits, single-agent RL, and multi-agent RL.

Our first goal is to incorporate fairness in restless and collapsing bandits, which are often used to model budget-constrained resource allocation in settings where arms have action-dependent transition probabilities, such as the allocation of health interventions among patients. We therefore introduce ProbFair, a probabilistically fair policy that maximizes the total expected reward and satisfies the budget constraint while ensuring a strictly positive lower bound on the probability of being pulled at each time step. We evaluate our algorithm on a real-world application, where interventions support continuous positive airway pressure (CPAP) therapy adherence among patients, as well as on a broader class of synthetic transition matrices. We show that ProbFair preserves utility while providing fairness guarantees.

Our work on ProbFair highlights another limitation of previous work: in certain applications, there is some willingness to forgo optimality in favor of cost as long as that trade-off is quantifiable. We continue to address this problem when we turn to the single-agent non-stationary Markov decision process setting. Previous theoretical work is limited by the focus on minimizing dynamic regret over many iterations. We expand the discussion of these algorithms to include time-to-convergence as well as cost. In addition, we introduce a crop management case study.

For the proposed work, we investigate two directions. First, we investigate solution methods for non-stationary and socially consequential environments. In the course of the analysis, we improve upon Mao et al. [2024]'s RestartQ-UCB algorithmic approach. Second, we investigate notions of equitable distribution of resources between multiple farmer agents.

Bio

Aviva Prins is a PhD student at University of Maryland. Her research focuses on developing mathematical tools and algorithms for socially consequential applications, with a focus on equitable resource distribution.

This talk is organized by Migo Gui