log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Enhancing Human-AI Interactions through Reinforcement Learning
Wichayaporn Wongkamjan
IRB-5137 https://umd.zoom.us/j/94615978626
Thursday, October 16, 2025, 2:00-3:30 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Reinforcement Learning (RL) has long been a crucial technique for solving decision-making problems. In recent years, RL has been increasingly applied to language models to align outputs with human preferences and guide reasoning toward verifiable answers (e.g., solving mathematical problems in MATH and GSM8K datasets). However, RL relies heavily on feedback or reward signals that often require human annotations or external verifiers. When these signals are subtle or ambiguous---as is common in natural language tasks---RL struggles, especially in socially complex domains such as negotiation (e.g., reaching a fair trade agreement between countries), collaboration, and betrayal (e.g., detecting and responding to deception in strategic games like Diplomacy). These situations require social intelligence that human experts routinely use, but that is challenging for current RL systems to capture and learn from.

Our goal is to explore complex decision-making and language environments, exploit their near-optimal strategies while demonstrating their flaws in languages. Ultimately, we aim to enhance human and artificial intelligence (AI) interactions; improving language in AI to be as close to humans and utilizing AI strengths to guide humans in complex decision-making tasks. Extending from Cicero, the first Diplomacy AI agent that has been trained to be optimal in decision-making while also able to communicate in natural language, has been over-claimed that it is human-level. It outperforms humans in terms of reward, however, it lacks essential cooperation and deception skills which are critical components in a competitive setting like Diplomacy.

Although Cicero still falls short of human-level performance in natural language, its strength in strategic modeling presents a valuable opportunity for use as an AI advisory system. To leverage this, we introduce PHOLUS, a system that integrates Cicero's strategic guidance into human play to support decision-making in Diplomacy. Our results show that PHOLUS enables beginners---with little to no prior experience---to achieve performance on par with expert players. This demonstrates the potential of human–AI collaboration, where PHOLUS provides strategic insights that effectively bridge the skill gap between novice and experienced players.

Deception in Diplomacy is notoriously difficult to detect; even experienced human players often fail to recognize it. To address this challenge, we introduce CTRL-D, a framework that outperforms both humans and large language models in identifying deceptive proposals. CTRL-D applies counterfactual reinforcement learning, leveraging Cicero’s rollouts and value model to estimate the potential cost if a proposer were to betray their stated intentions. We frame this analysis through concepts borrowed from scam tactics---bait, switch, and edge---to characterize different forms of deceptive communication. By focusing on Diplomacy’s structured environment, we show that detecting deception is difficult even with clear state and action spaces, underscoring the need for more robust methods in both game and real-world settings.

To advance AI communication and strengthen its collaboration with humans, we propose an AI agent that communicates, reasons, and acts like a human expert. Our research follows two directions: (1) behavior cloning and distilling knowledge from obsolete language models to modern language models, where we focus on contexts and methods that require for modern models to learn, (2) training agents for effective negotiation by first grounding language to constrained action spaces, where we can better isolate and evaluate communicative strategies. We design a method and reward verifiers to increase collaborative communication and decisions.

Bio

Wichayaporn is a fifth-year Ph.D. student in Computer Science at the University of Maryland, College Park, advised by Prof. Jordan Boyd-Graber and formerly by Prof. Furong Huang. Her research focuses on developing reinforcement-learning-based methods for human-AI interaction, aligning multi-agent systems with human behavior and preferences. She has published at venues such as ACL, NAACL, ICML, and ICLR.

This talk is organized by Migo Gui