Towards AI Alignment: Advancing Fairness, Reliability, and Human-Like Perception in AI
Bang An
Abstract
Abstract:
As artificial intelligence (AI) increasingly shapes various aspects of society, ensuring its trustworthiness and alignment with human values has become crucial. This study advances AI alignment by enhancing fairness, reliability, and human-like perception in AI.
Firstly, we tackle the challenge of maintaining fairness in an ever-changing world. Recognizing that the common assumption of identical training and test data distributions is often unrealistic, we introduce a technique to keep models unbiased even under distribution shifts. Secondly, we explore enhancing the understanding capabilities of Vision Language Models (VLMs) by mimicking human visual perception. Our training-free method improves both the accuracy and robustness of zero-shot visual classification. Lastly, we delve into the reliability of generative AI. We benchmark the robustness of image watermarks used for identifying AI-generated images. Our benchmark reveals several critical vulnerabilities of popular watermarks and guides the development of more secure ones. The ongoing works focus on the reliability of Large Language Models (LLMs). We observe the exaggerated safety of many LLMs and propose an approach to automatically identify falsely refused cases. Based on this technique, we discuss future works on fine-grained alignment.
This study confronts key challenges in AI alignment, enriching the knowledge base to direct AI technology toward maximizing societal benefits and minimizing risks. It offers comprehensive approaches to enhancing AI systems, aligning them more closely with human values, and paving the way for future innovations in reliable AI.
Firstly, we tackle the challenge of maintaining fairness in an ever-changing world. Recognizing that the common assumption of identical training and test data distributions is often unrealistic, we introduce a technique to keep models unbiased even under distribution shifts. Secondly, we explore enhancing the understanding capabilities of Vision Language Models (VLMs) by mimicking human visual perception. Our training-free method improves both the accuracy and robustness of zero-shot visual classification. Lastly, we delve into the reliability of generative AI. We benchmark the robustness of image watermarks used for identifying AI-generated images. Our benchmark reveals several critical vulnerabilities of popular watermarks and guides the development of more secure ones. The ongoing works focus on the reliability of Large Language Models (LLMs). We observe the exaggerated safety of many LLMs and propose an approach to automatically identify falsely refused cases. Based on this technique, we discuss future works on fine-grained alignment.
This study confronts key challenges in AI alignment, enriching the knowledge base to direct AI technology toward maximizing societal benefits and minimizing risks. It offers comprehensive approaches to enhancing AI systems, aligning them more closely with human values, and paving the way for future innovations in reliable AI.
Examining Committee
Bio
Bang is a fourth-year PhD student in Computer Science at the University of Maryland, advised by Prof. Furong Huang. Before coming to UMD, she was a research staff member at IBM Research China. She received her bachelor’s degree from Northeastern University (China) and master’s degree from Tsinghua University. Her research focuses on Reliable Machine Learning with a particular interest in understanding and improving the robustness, fairness, generalization, and interpretability of deep learning models.
This talk is organized by Migo Gui