As generative models advance across imagery and language applications, there is a growing need for mechanisms that establish content provenance, calibrate detection of AI-generated content, and ensure reliability that bear directly on the trustworthiness of GenAI among other factors.
This work investigates robust provenance by proving a lower bound that exposes the vulnerability of imperceptible image watermarks under diffusion purification, demonstrating evasion and spoofing attacks in practice, and introducing interpretable and verifiable defenses (IConMark and DREW).
For AI-content detection, we formalize a robustness–reliability trade-off and show that a universal, training-free adversarial paraphrasing procedure reliably degrades diverse deployed detectors, motivating calibrated operating regimes and explicit uncertainty communication.
For multimodal reliability, we surface human-readable failure modes with PRIME, provide rigorous evaluation for text-guided image editing via EditVal, and improve compositionality with lightweight controls that make quality–fidelity trade-offs explicit.
Together, these results connect theoretical limits with practical tooling to strengthen provenance, inform detection policy and practice, and expose and mitigate reliability failures in service of more trustworthy GenAI.
Mehrdad Saberi is a PhD student in the Department of Computer Science at the University of Maryland, College Park, advised by Prof. Soheil Feizi. His research interests lie at trustworthy and safe generative AI models, with a particular focus on watermarking, data provenance, and detection of AI-generated content, mostly on image and text domains. Additionally, he has worked on other AI safety topics such as interpretability, text-image model evaluation.

