Talks

PhD Proposal: Towards Reliable Reasoning and Alignment in Large Models

Aakriti Agrawal

IRB-4109 https://umd.zoom.us/j/6421587304?pwd=c3JYSno4SExVVnEvZkNpcFhMOFBVQT09

Wednesday, December 17, 2025, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Large Language Models (LLMs) have achieved remarkable progress in reasoning and alignment, largely driven by post-training methods such as reinforcement learning and reward-model–based supervision. However, such guidance is often imperfect—or entirely unavailable—limiting the reliability of these large models. This thesis aims at improving these large models and making them more reliable both with and without external guidance.

First, we study process supervision used primarily for Large Reasoning Models (LRMs) and show that Process Reward Models (PRMs) often overcredit incorrect steps, leading to false positives and policy misalignment. We provide a formal analysis of this issue and introduce an Overcredit Contrastive (OC) loss, which penalizes false positives to produce more precise step-level signals and better-aligned reasoning policies.

Next, we investigate verifier-free improvements during both inference and training time. (i) In Uncertainty-Aware Answer Selection, we calibrate token log-likelihoods across diverse LLMs to identify the most reliable response, consistently outperforming majority voting and single-model self-consistency at comparable cost. Our method relies solely on model confidence for its improvement. (ii) In EnsemW2S, we propose a token-level, weak-to-strong (W2S) ensembling approach that iteratively corrects the weaknesses of smaller experts using a small set of labeled examples and then uses the refined ensemble to supervise stronger students, improving generalization to out-of-distribution and high-difficulty reasoning tasks.

Finally, we extend our analysis to vision–language models (VLMs) and show how improving inter-modal alignment reduces hallucinations by rebalancing attention toward visual evidence, significantly mitigates misalignment and strengthens factual consistency.

Bio

Aakriti is a 4th Year PhD student in the Department of Computer Science at the University of Maryland, College Park, advised by Prof. Furong Huang. Her research focuses on advancing reasoning and alignment in large language models (LLMs), with a particular emphasis on process supervision, post-training, and multi-LLM systems. Her broader goal is to make AI agents reliable, safe, and robust. She has previously interned at Amazon, Dolby and Capital One and has works published at ICML, EMNLP, Neurips, ICRA and IROS.

This talk is organized by Migo Gui