What can text analysis tell us about society? Enormous corpora of news, social media, and historical documents record events, beliefs, and culture. Automated text analysis is interesting since it scales to large data sets, and can assist in discovering patterns and themes. My research develops practical and scientifically rigorous text analysis methods that can help answer research questions in sociolinguistics and political science.
For this talk I'll focus on our work on events and international politics. Political scientists are interested in studying international relations through event data: time series records of who did what to whom, as described in news articles. Rule-based information extraction systems have been used for 20 years to study these phenomena. We develop a dynamic logistic normal statistical model for unsupervised learning of event classes and political dynamics from news text. It learns what verbs and textual descriptions correspond to different types of diplomatic and military interactions between countries, and simultaneously infers the time-series of interactions between countries. Unlike a topic model, it leverages syntactic parsing and argument structure, which is critical in this domain. Using a parsed corpus of several million news articles over 15 years, we evaluate how well its learned event classes match ones defined by experts in previous work, how well its inferences about countries correspond to real-world conflict, and conduct a qualitative case study illustrating its inferences for the recent history of Israeli-Palestinian relations.
This is joint work with Brandon M. Stewart (Harvard University) and Noah A. Smith (CMU). Publication (ACL 2013) and more information here: http://brenocon.com/irevents/
Brendan O'Connor is a 5th year Ph.D. Candidate in Carnegie Mellon University's Machine Learning Deptartment. He is interested in machine learning and natural language processing, especially when informed by or applied to the social sciences. In the past he has interned in the Facebook Data Science group, and worked on crowdsourcing (Crowdflower / Dolores Labs) and "semantic" search (Powerset). His undergraduate degree was Symbolic Systems.

