Recalibrating Machine Learning for Social Biases
Abstract
Research on bias and fairness in natural language processing (and, more broadly, machine learning) abound, yet evidence of harms from these data-driven technologies continues to surface. I argue that the problem of data and model bias needs to be reframed. Rather than trying to remove bias, we should aim to communicate bias. Rather than trying to create neutral or universal models, we should aim to create situated, community-centered models. Drawing on feminist theories, critical discourse analysis, and critical heritage studies, I explain the inevitability of bias in data, particularly language data. Then, I outline an alternative to the predominant, top-down approach to creating datasets and models, and describe my implementation of this in my research on gender biased text classification for archives. I close the talk with a vision of the possibilities that a bottom-up approach to dataset and model creation offers.
Bio
Dr. Lucy Havens is a researcher, data scientist, designer and consultant working at the intersection of natural language processing, human-computer interaction, critical data studies, and cultural heritage. In 2019, she won the Edinburgh College of Art Purchase Award for her exhibited work, Physically Encoding Collection Metadata. In 2024, she earned a Ph.D. from the University of Edinburgh, where she created a natural language processing approach to investigating gender biases in archival catalog metadata descriptions.
This talk is organized by Naomi Feldman