log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Proposal: Measuring What Matters in Trustworthy AI: Certified Robustness to Agentic Safety
No speaker yet
IRB-3137 https://umd.zoom.us/j/8566872628?pwd=VDJ1WWZCamE2Ym9ZcGh2RjZ6YVY1Zz09&omn=97071414122&jst=2
Thursday, December 18, 2025, 11:00 am-12:30 pm
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

As AI systems move into security-critical roles, trustworthiness must be established under the conditions that matter in deployment -- adaptive adversaries, and low-tolerance for rare but high-impact failures. My works advance a measurement-first agenda for trustworthy AI that pairs two complementary approaches: certified guarantees for well-specified threat models, and realistic adversarial evaluation for complex generative and agentic systems where formal guarantees are incomplete. On the certified side, DRSM develops a de-randomized smoothing methodology for malware detection that provides formal robustness certificates, enabling security assurances beyond empirical accuracy for safety-critical classification. On the empirical side, adversarial methods and benchmarks are used to stress-test modern GenAI along failure modes that dominate real-world risk: efficient red-teaming attacks against aligned language models, detection settings for hallucinations and unreliable outputs, and robust evaluation of AI-authorship signals under AI-polishing and adversarial paraphrasing -- particularly in low false-positive operating regimes. Building on these foundations, my forward direction targets agentic safety for tool-using AI agents acting over multiple steps, where harm and reliability are defined by downstream outcomes rather than single-turn text. The goal is to develop evaluation principles that remain effective through planning, tool interaction, memory, and environment feedback -- linking provable robustness and deployment-aligned testing into a unified framework for secure, reliable AI.

Bio

Shoumik Saha is a fourth-year Ph.D. student in Computer Science at the University of Maryland, College Park, advised by Prof. Soheil Feizi. His research focuses on reliability and security of generative AI, including LLM alignment and hallucination mitigation, adversarial attacks and defenses, AI-agent safety, and AI-authorship detection.

This talk is organized by Migo Gui