Talks

Detecting Sentiment and Identifying Influence in Social Media

Prem Melville - IBM Research, Machine Learning Group

Wednesday, April 11, 2012, 1:30-2:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies that are increasingly concerned about monitoring the discussion around their products. As such, marketing organizations need to be aware of what people are saying in influential blogs, how the expressed opinions could impact their business, and how to extract business insight and value from these blogs. This has given rise to the emerging discipline of Social Media Analytics, which draws from Social Network Analysis, Machine Learning, Data Mining, Information Retrieval, and Natural Language Processing. This talk discusses two fundamental challenges in the analysis of social media – detecting sentiment and identifying influence in networks.

Sentiment Analysis focuses on the task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this talk, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. This work has led to the formulation of a general Machine Learning framework called Dual Supervision, where classifiers can be built using both example labels and “feature labels.”

Much work in Social Network Analysis has focused on the identification of the most important actors in a social network. This has resulted in several measures of influence, authority, centrality or prestige. Most of such sociometrics (e.g., PageRank) are driven by intuitions based on an actor’s location in a network. It is our position that asking for the “most influential” actors is an ill-posed question, unless it is put in context with a specific measurable task. Constructing a predictive task of interest in a given domain provides a mechanism to quantitatively compare different measures of influence. Furthermore, when we know what type of actionable insight to gather, we need not rely on a single network prestige measure. A combination of measures is more likely to capture various aspects of the social network that are predictive and beneficial for the task. In order to do this, we introduce supervised rank aggregation techniques and show the benefits of locally-optimal order-based rank aggregation. We illustrate these ideas through a case study on a data set of 40 million Twitter users, where we study measures of influence in the context of predicting when users will be rebroadcasted.

Bio

Prem Melville received a Ph.D. in Computer Science at the University of Texas at Austin, and is currently a Research Scientist in the Machine Learning Group at IBM Research. His research interests lie in Machine Learning and Data Mining. More specifically, he has worked on active learning, ensemble methods, active feature-value acquisition, sentiment analysis, dual supervision, rank aggregation, recommender systems, text classification, and applications of data mining to social media analytics, business analytics and e-commerce. Recently, Prem has served on the organizing committees of CIKM 2012, ICML 2011, MLG 2011, KDD 2010 and WSDM 2010. He also organized the first workshop on Budgeted Learning at ICML 2010, and the first workshop on Social Media Analytics (SOMA 2010) at KDD, and Mining and Learning with Graphs (MLG 2011). Prem also serves on the Editorial Board of Data Mining and Knowledge Discovery. He received the Best Application Paper Award at KDD 2010, and has won KDD Cup 2009, KDD Cup 2008 and the INFORMS Data Mining Contest 2008.

This talk is organized by Jay