Following the progress in computing and machine learning algorithms as well as the emergence of big data, artificial intelligence (AI) has become a reality impacting every fabric of our algorithmic society. Despite the explosive growth of machine learning, the common misconception that machines operate on zeros and ones, therefore they should be objective, still holds. But then, why does Google Translate convert these Turkish sentences with gender-neutral pronouns, “O bir doktor. O bir hemsire”, to these English sentences, “He is a doctor. She is a nurse”? As data-driven machine learning brings forth a plethora of challenges, I analyze what could go wrong when algorithms make decisions on behalf of individuals and society if they acquire statistical knowledge of language from historical human data.
In this talk, I show how we can repurpose machine learning as a scientific tool to discover facts about artificial and natural intelligence, and assess social constructs. In the first part, I focus on individuals and demonstrate how machines that learn the unique linguistic style of an individual in a supervised learning setting can be privacy infringing. In the second part, I shift the focus on society and prove that machines trained on societal linguistic data inevitably inherit the biases of society. To do so, I derive a method that investigates the construct of language models trained on billions of sentences collected from the World Wide Web. I conclude the talk with future directions and open research questions in the field of ethics of machine learning.
Aylin Caliskan is a postdoctoral researcher and a fellow at Princeton
University’s Center for Information Technology Policy. Her research
interests include the emerging science of bias in machine learning and
fairness, AI ethics, data privacy, and security. Her work aims to
characterize and quantify aspects of artificial and natural
intelligence using a multitude of machine learning and language
processing techniques. In her recent publication in Science, she
demonstrated how semantics derived from language corpora contain
human-like biases. Prior to that, she developed novel privacy attacks
to de-anonymize programmers using code stylometry. Her presentations
on both de-anonymization and bias in machine learning are the
recipients of best talk awards. Her work on semi-automated
anonymization of writing style furthermore received the Privacy
Enhancing Technologies Symposium Best Paper Award. Her research has
received extensive press coverage across the globe, contributing to
public awareness on risks of AI. Aylin holds a PhD in Computer Science
from Drexel University and a Master of Science in Robotics from the
University of Pennsylvania.