As part of an emerging discipline of "computational epidemiology", public health researchers and practitioners have turned to user-generated text data, such as social media messages, as viable sources of timely information. These emerging data streams can augment traditional health surveillance tools by providing trends faster and at wider/finer geographic granularity, and can give insights into areas that are traditionally difficult to measure at population-scale, including public perceptions, sentiment, and behavioral response.
In this talk, I'll show how data mined from user-generated web content can inform three interesting applications in public health and behavioral medicine: (1) tracking the prevalence of influenza in the US and around the world (through Twitter); (2) monitoring rates of air pollution throughout China (through Sina Weibo); (3) understanding how people use the web to make important medical decisions (through search query logs). The common theme is that all three tasks can be improved by using NLP to filter for "experiential" data: messages indicating that the author is personally experiencing the relevant event (e.g. a flu infection) and not discussing a more general awareness or concern. I'll also discuss how I'd like to see (text-driven) computational epidemiology grow as a field, and where NLP can make the largest contributions.
This talk covers joint work with Mark Dredze (JHU), David Broniatowski (GWU), Shiliang Wang (JHU), Michael Smith (JHU), Alex Lamb (JHU->Amazon), Ryen White (MSR) and Eric Horvitz (MSR).
Michael Paul is a PhD candidate in Computer Science at Johns Hopkins University. He earned an M.S.E. in CS from Johns Hopkins University in 2012 and a B.S. in CS from the University of Illinois at Urbana-Champaign in 2009. He has received PhD fellowships from Microsoft Research, the National Science Foundation, and the Johns Hopkins University Whiting School of Engineering. His research focuses on exploratory machine learning and natural language processing for the web and social media, with applications to computational epidemiology and public health informatics.