PhD Proposal: Incorporating Prosocial Constraints and Exploiting Problem Structure in Sequential Decision-making and Model Evaluation
Christine Herlihy
Abstract
Sequential decision-making tasks canonically feature an agent who must explore the environment and exploit the knowledge it gains to maximize total expected reward over some time horizon. However, when algorithms are used to make decisions and induce behaviors over time in high-stakes domains, it is often necessary to trade-off reward maximization against competing objectives, such as individual or group fairness, cooperation, or risk mitigation. Additionally, when the decisions to be made are combinatorial, careful use of the structural information which characterizes or connects our decision points may facilitate our search for efficient solutions.
In this proposal, we consider a constrained resource allocation task characterized by: (1) the presence of multiple objectives; and (2) the need to exploit different types of structure contained within the problem instances in order to ensure tractability and exploit externalities. We specifically consider the restless bandit setting, where a decision-maker is tasked with determining which subset of individuals (referred to as arms) should receive a beneficial intervention at each timestep, subject to the satisfaction of a budget constraint. Each restless arm is formalized as a Markov decision process (MDP), and receipt of the intervention results in an increased probability of a favorable state transition at the next timestep, relative to lack of receipt.
It is PSPACE-hard to pre-compute the optimal policy for a given set of restless arms in the general case. However, as conjectured by Whittle and proven by Weber and Weiss,
when each arm is indexable, a tractable solution exists that is provably asymptotically optimal: we can decouple the arms and consider a Lagrangian relaxation of the original
problem. Our core contributions include the introduction of novel algorithms to address two limitations of Whittle index-based policies, including (1) the lack of distributive fairness guarantees; and (2) the inability to exploit externalities when resources are allocated within a community.
In this proposal, we consider a constrained resource allocation task characterized by: (1) the presence of multiple objectives; and (2) the need to exploit different types of structure contained within the problem instances in order to ensure tractability and exploit externalities. We specifically consider the restless bandit setting, where a decision-maker is tasked with determining which subset of individuals (referred to as arms) should receive a beneficial intervention at each timestep, subject to the satisfaction of a budget constraint. Each restless arm is formalized as a Markov decision process (MDP), and receipt of the intervention results in an increased probability of a favorable state transition at the next timestep, relative to lack of receipt.
It is PSPACE-hard to pre-compute the optimal policy for a given set of restless arms in the general case. However, as conjectured by Whittle and proven by Weber and Weiss,
when each arm is indexable, a tractable solution exists that is provably asymptotically optimal: we can decouple the arms and consider a Lagrangian relaxation of the original
problem. Our core contributions include the introduction of novel algorithms to address two limitations of Whittle index-based policies, including (1) the lack of distributive fairness guarantees; and (2) the inability to exploit externalities when resources are allocated within a community.
Examining Committee
Bio
Christine Herlihy is a CS PhD student at the University of Maryland, College Park, where she is advised by John P. Dickerson. Her research interests include: sequential decision-making under uncertainty (i.e., multi-armed bandits; reinforcement learning), algorithmic fairness, knowledge representation and reasoning, and healthcare. During her PhD, she has interned at Amazon Robotics, Microsoft Research, and Google Research. Prior to UMD, she earned her MS from Georgia Tech and completed her undergraduate studies at Georgetown University. To learn more, you can visit her website.
This talk is organized by Tom Hurst