Detrimental online behavior such as harassment and cyberbullying is becoming a serious, large-scale problem damaging people's lives. This phenomenon is creating a need for automated, data-driven detection of such behaviors. In this talk, I’ll present a machine learning method for detecting online harassment that jointly considers social structure and language usage. To address the elusive nature of online behavior, the learning algorithm uses weak supervision. Annotators provide a small seed vocabulary of bullying indicators, and the algorithm uses a large, unlabeled corpus of social-media interactions to train various models of harassment based on who participates and based on what language is used. The algorithm tries to maximize the agreement between these estimates, unifying different perspectives of the problem. I’ll discuss quantitative and qualitative evaluations of our method on social-media datasets that demonstrate its effectiveness in harassment detection. Then I’ll discuss some bigger-picture questions surrounding the use of machine learning for detection of detrimental online behavior.
Bert Huang is an assistant professor in the Department of Computer Science at Virginia Tech, where he directs the Machine Learning Laboratory. He earned his Ph.D. from Columbia University in 2011, and spent three years as a postdoctoral research associate at the University of Maryland, College Park before joining Virginia Tech in 2015. His research investigates machine learning, with a focus on analyzing complex systems. His work addresses topics including structured prediction, probabilistic graphical models, and computational social science. His papers have been published at conferences including NIPS, ICML, UAI, and AISTATS, and he is an action editor for the Journal of Machine Learning Research.