Decentralized coordination at unsignalized intersections remains a persistent failure mode for modern autonomous driving policies when vehicle-to-everything (V2X) communication is unavailable. Policies trained primarily with ego-centric objectives (e.g., collision avoidance, comfort, and action consistency) can be overly conservative in symmetric interactions, leading to deadlocks, or can make conflicting commitments, leading to unsafe near-collisions. This thesis addresses this gap by introducing a social post-training method for Alpamayo-R1 (AR1) that explicitly rewards behavior that is predictable to neighboring agents.
We extend AR1's Group Relative Policy Optimization (GRPO) post-training by augmenting the reward with Expectation Alignment (ELIGN), an intrinsic social term that penalizes mismatch between a learned neighbor-expectation model and the realized shared next observation. To make ELIGN applicable to AR1's continuous trajectory outputs, we define the shared observation space over low-dimensional kinematic waypoints (x, y, ψ, v) rather than high-dimensional perception features, and we learn a compact trajectory prediction model jointly during fine-tuning.
We evaluate the proposed AR1+ELIGN post-training in a multi-agent simulation benchmark of symmetric four-way arrival scenarios in AlpaSim and compare against an ego-centric AR1 baseline as well as standard multi-agent reinforcement learning baselines (PPO and MAPPO). Performance is measured by collision rate (as a hard safety constraint), deadlock rate, intersection clearance time, and jerk variance as an indicator of indecision. Finally, we study zero-shot social generalization by testing whether ELIGN-fine-tuned agents coordinate effectively with novel partner agents not encountered during training. Results show that introducing expectation-aligned intrinsic reward improves decentralized intersection throughput while preserving safety, and provides evidence for improved coordination with unseen partners.
John Cole is an M.S. candidate in Computer Science at the University of Maryland at College Park, where he previously earned his B.S. in Computer Science and Mathematics. Advised by Prof. Tom Goldstein, his master's thesis research focuses on post-training Vision-Language-Action (VLA) models to incentivize safe driving behaviors and zero-shot coordination in autonomous vehicles. His broader academic work explores explainable AI, model predictive control, and multi-agent AI. Previously, John served as an Embedded Software Engineer at Origin AI, designing systems for offline intrusion detection using wireless signal processing. He was previously an intern at Evans and Chambers Technology in Washington, D.C. and the Johns Hopkins University Applied Physics Laboratory.
Examining Committee Chair: Dr. Tom Goldstein
Members:
Dr. David Jacobs
Dr. Christopher Metzler

