Talks

PhD Proposal: Obtaining Fine-Grained Control over Large Language Models

Sicheng Zhu

IRB-5105 Brendan Iribe Center for Computer Science and Engineering (IRB)

Wednesday, May 15, 2024, 1:00-2:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Aligned large language models (LLMs) can largely follow people's verbal prompts to generate the desired content. However, control objectives cannot always be described verbally, such as certain reward functions, and LLMs' ability to follow instructions is imperfect, as demonstrated by hallucinations and jailbreak vulnerabilities. In this case, fine-grained control of LLMs can achieve non-verbal objectives to accomplish more tasks and enhance the models' abilities to follow instructions and reason, ultimately maximizing their utility to humans.

In this proposal, we introduce a framework for fine-grained control, and instantiate two simple implementations to demonstrate its potential. First, we provide a more unified understanding of controllable generation for LLMs from a sampling perspective, from which we derive a general framework for fine-grained controllable generation. Then, we instantiate the framework on two specific objectives: generating coherent jailbreak prompts and generating harmless prompts that trigger LLMs' false refusals. We also demonstrate on a classification task that allowing models to refine its outputs has the potential to enhance reasoning capabilities. Finally, we outline the steps required to fully implement this framework, aiming at ultimately achieving fine-grained control over LLMs.

Bio

Sicheng Zhu is a PhD student at the University of Maryland, College Park where he is advised by Prof. Furong Huang. His research focuses on trustworthy machine learning, including robustness to distribution shift and adversarial examples, and the intersection of these areas with geometric deep learning.

This talk is organized by Migo Gui