In recent years, rapid advancements in large language models (LLMs) have steadily shifted their applications from simple chatbots to increasingly complex, autonomous agents. Agentic applications require LLMs to interact with a broad range of external information sources, tools, and environments to solve intricate tasks with minimal human oversight—posing significant challenges to their reliability. This dissertation presents a series of contributions toward (more) reliable agentic LLMs.
Firstly, we explore how LLMs can be made more robust when incorporating external references—an essential capability for many agentic applications. We introduce chain-of-defensive-thought, a simple yet effective technique that instructs LLMs to generate a chain of thought mimicking a structured reasoning process of cross-checking. This highly accessible approach significantly improves the robustness of a wide range of LLMs against reference corruption. Importantly, it highlights a promising direction: exploiting the reasoning abilities of LLMs for robustness on tasks that are not necessarily reasoning-centric, which is a timely insight given the growing interest in LLM reasoning and the increasing reliability demands of agentic applications.
Secondly, we examine the reliability of tool use in agentic LLMs. While external tools can dramatically extend the capabilities of LLMs, the current paradigm—where models choose tools based solely on text descriptions—proves fragile. We demonstrate how strategic edits to tool descriptions can substantially bias tool usage, revealing a vulnerability in standard tool/function-calling protocols. These findings underscore the need for a grounded mechanism for agentic LLMs to select and utilize tools and resources.
Finally, we address the reliability of LLM evaluations, particularly in the presence of test set contamination, where models may (knowingly or not) train on test data prior to evaluation. We propose DyePack, a novel framework that repurposes backdoor techniques into a principled mechanism for identifying such contamination. DyePack operates without requiring access to model internals and supports both multiple-choice and open-ended tasks. More importantly, it provides provable guarantees by enabling exact false positive rate (FPR) computation before flagging any model as contaminated—effectively preventing false accusations while offering strong evidence for every case detected. This positions DyePack as a powerful tool for maintaining the integrity of open benchmarks and safeguarding our pathway toward reliable agentic LLMs.
Wenxiao Wang is a Ph.D. student in Computer Science at the University of Maryland. His recent research focuses on developing reliable large language models for agentic applications. He received his B.S. degree in Computer Science from the Yao Class at Tsinghua University in 2020.
He has held research intern positions at Sony AI (summer 2023) in the Privacy-Preserving Machine Learning (PPML) team and at Bytedance (summer 2022). He also worked as a research assistant at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University (2020–2021), was a visiting student researcher at UC Berkeley (2019), and interned at Bytedance AI Lab (2018).