Talks

PhD Defense: Uncovering, Understanding, and Mitigating Social Biases in Language Models

Haozhe An

IRB-4109 Brendan Iribe Center for Computer Science and Engineering (IRB)

Thursday, May 8, 2025, 9:30-11:30 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

This dissertation investigates how language models, including contemporary LLMs, can perpetuate social biases related to gender, race, and ethnicity as inferred from first names. Guided by the principle of counterfactual fairness, we use name substitution to uncover, understand, and mitigate these biases across three domains: stereotypes about personal attributes, occupational bias, and overgeneralized assumptions about romantic relationships.

By analyzing model behavior across diverse names, this dissertation reveals patterns of unfair treatment, such as personality judgments in social commonsense reasoning influenced by demographic associations, discrimination in hiring based on gender, race, and ethnicity, and heteronormative bias in relationship predictions. To address these issues, we propose open-ended diagnostic frameworks, interpretability analyses based on contextualized embeddings, and a novel consistency-guided finetuning method.

Together, these contributions aim to build fairer, more interpretable, and more inclusive language technologies.

Bio

Haozhe An is a Ph.D. candidate in the Department of Computer Science, advised by Professor Rachel Rudinger. Haozhe is a member of the CLIP lab. His research aims to understand and mitigate the issues of bias and fairness in NLP systems.

This talk is organized by Migo Gui