log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Towards Principled AI Agents with Asymmetric and Decentralized Information
Xiangyu Liu
IRB-5105 https://umd.zoom.us/j/7269246363#success
Monday, June 1, 2026, 2:30-4:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

AI models have been increasingly deployed to develop autonomous agents for decision-making, with prominent applications in Go, video games, robotics, autonomous driving, healthcare, and human assistance. Many of these successes involve multiple AI agents interacting dynamically with one another and with humans. At the same time, these systems often face challenging partial observability, further complicated by asymmetric information between training and testing and decentralized information across agents. This thesis aims to lay the theoretical foundations for principled AI agents operating under partial observability with asymmetric and decentralized information.

First, we study reinforcement learning agents in partially observable Markov decision processes when privileged information is available during training but not at test time, a common setting in robot learning and deep RL. We revisit two major empirical paradigms, expert distillation, also known as teacher-student learning, and asymmetric actor-critic, and identify their limitations in finding near-optimal policies. We then develop a principled algorithm with polynomial sample complexity and quasi-polynomial computational complexity, revealing the provable benefits of privileged information for AI agents in partially observable environments.

Second, we study multi-agent reinforcement learning with information sharing under decentralized information. To circumvent known hardness results and avoid computationally intractable oracles, we advocate leveraging potential information sharing among agents. We establish several computational complexity results that justify the necessity of information sharing and appropriate observability assumptions. Motivated by the inefficiency of planning in the ground-truth model, we further approximate the shared common information to construct an approximate model of the partially observable stochastic game, in which approximate equilibrium planning can be quasi-efficient under these assumptions. We then develop a partially observable multi-agent RL algorithm that is both statistically and computationally quasi-efficient.

Third, we study multi-agent RL with latent state representations under decentralized information. Beyond using common information in the raw observation space, we propose to align different agents' latent representations through a new representation learning framework, Representationally Aligned Approximate Latent Model, or RA2LM. We establish conditions under which latent-model equilibria exist and can be used to solve the original dynamic game before compression. We also develop provable representation learning algorithms for computing such latent-model equilibria with both computational and statistical efficiency. Along the way, we also design an efficient learning algorithm for an important special case of RA2LM, partially observable stochastic games with deterministic filters, which improves existing results by addressing the curse of multiagency and relaxing the required privileged-information assumptions.

Finally, we examine large-language-model-powered agents, which use LLMs as the main controller for decision-making, with the goal of understanding and enhancing their capabilities in canonical decentralized and multi-agent scenarios. In particular, we use regret, a standard metric in online learning and RL, to study the in-context decision-making limits of LLM agents through controlled experiments. Motivated by the observed no-regret behaviors, we propose a hypothetical model that can well explain such behaviors and prove it is a natural consequence of pre-training via next-token-prediction.

Together, these results provide a principled understanding of AI agents under partial observability, asymmetric information, and decentralized information, paving the way toward practical AI agents for real-world strategic and decentralized systems.

Bio

Xiangyu Liu is currently pursuing a Ph.D. in Computer Science at the University of Maryland, College Park. His research interests include reinforcement learning in multi-agent and partially observable settings, with applications to large language model agents. His work has appeared in venues including ICML, NeurIPS, ICLR, and SIAM, and has received a spotlight presentation at ICLR and an Outstanding Paper Award at a NeurIPS workshop. He received the Dean’s Fellowship from the University of Maryland in 2021, and has held research internships with Bloomberg AI Group in 2022 and Google Research in 2025.

This talk is organized by Migo Gui