log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Safety, Robustness and Reliability of AI
Gaurang Sriramanan
IRB-4109 https://umd.zoom.us/my/gaurangs
Wednesday, April 8, 2026, 12:00-2:00 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Over the past few years, rapid advancements in Artificial Intelligence (AI) have achieved quantum leaps in performance across domains ranging from computer vision to natural language understanding. Given their widespread usage in safety-critical applications such as autonomous navigation and medical diagnosis, it is imperative to characterize their vulnerabilities and develop robust mitigation strategies. This thesis investigates the Safety, Robustness, and Reliability of AI through a taxonomy of vulnerabilities comprising three primary dimensions: oversensitivity to input perturbations, undersensitivity to semantic shifts, and structural limitations in generative reliability.

First, we investigate the phenomenon of oversensitivity in deep networks, where minor changes to the input result in disproportionately large and often catastrophic model failures. To address this in the vision domain, we develop Nuclear Curriculum Adversarial Training (NCAT), an efficient single-step training procedure to obtain models that are robust against a union of Lp threat models (L1, L2 and L-infinity). By introducing a curriculum schedule to mitigate catastrophic overfitting, we obtain the first L1 robust model trained via single-step adversaries, with performance comparable to multi-step methods. We further investigate oversensitivity in Large Language Models (LLMs) by introducing a fast beam-search based adversarial attack called BEAST, which can jailbreak standard LLMs in under one GPU-minute.

Second, we characterize the complementary problem of undersensitivity, wherein models maintain a near-uniform level of confidence despite large, perceptually significant changes in the input space. We present a novel Level Set Traversal (LST) algorithm that iteratively uses orthogonal components of the local gradient to identify the “blind spots” of common vision models. We study the geometry of level sets, and show that there exist linearly connected paths in input space, between images that a human oracle would deem to be extremely disparate, though vision models retain a near-uniform level of confidence on the same path.

Third, we investigate the detection of hallucinations in LLMs — outputs that are fallacious or fabricated, yet often appear plausible at first glance — using LLM-Check, an effective suite of techniques that only rely upon the internal hidden representations, attention similarity maps and logit outputs of an LLM. We demonstrate its efficacy over broad-ranging settings and diverse datasets: from zero-resource detection to cases where multiple model generations or external databases are made available at inference time, or with varying access restrictions to the original source LLM.

Bio

Gaurang Sriramanan is a fifth-year PhD student in Computer Science at the University of Maryland, College Park, where he is advised by Prof. Soheil Feizi. His research focuses on the safety, robustness and reliability of AI systems, by characterizing various failure modes and developing robust risk mitigation strategies. He holds a B.S. and M.Sc. in Mathematics from the Indian Institute of Science and an M.S. in Computer Science from the University of Maryland.

 

Examining Committee Chair: Dr. Soheil Feizi

Dean's Representative: Dr. Behtash Babadi

Committee Co-Chair

Members:

Dr. David Jacobs

Dr. Yizheng Chen

Dr. Hal Daumé

This talk is organized by Migo Gui