Talks

PhD Proposal: Towards Distribution-Preserving Watermarks for Large Language Models

Yihan Wu

Remote

Friday, July 5, 2024, 11:00 am-12:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

https://umd.zoom.us/j/2857756289?pwd=ZHliQi9QMGI4SkF6SEIxRWdRTDJvdz09

Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a Distribution-Preserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach’s distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.

Bio

Yihan Wu is a PhD student at the University of Maryland, College Park advised by Dr. Heng Huang. His work broadly includes the robustness and generalization of deep neural networks. His recent research focuses on the safety and efficiency of LLMs.

This talk is organized by Migo Gui