log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Computational style analysis, with practical applications to automatic summarization
Friday, November 8, 2013, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Natural language research is often equated with attempts to derive structure and meaning from unstructured data. Given this focus, computational treatment of style has largely remained unexplored. In this talk I will argue that elements of style such as the redundancy in text, the level of specificity or its entertaining effect, affect the performance of standard systems and that good approaches to computational style analysis will be beneficial for such systems. 

I will first briefly present our findings from a corpus study of local coherence motivating the need for style analysis. Theories of discourse relations and entity coherence fail to explain local coherence for a large portion of newspaper text, with stylistic factors often involved in the cases where standard theories do not apply. 

Next I will present our work on developing measures of one particular element of style, text specificity. I will discuss how we successfully developed a classifier for sentence specificity. The classifier allows us to analyze specificity in summaries produced by people and machines and reveals that machine summaries are overly specific. Furthermore analysis of sentence compression data shows that when summarizing people often edit a specific sentence in the source into general one for the summary, indicating that specificity is a suitable objective for compression systems that will naturally lead to the need for compression. 

I will conclude with a brief overview of other style-related tasks which affect the performance of summarization systems. 

Bio

Ani Nenkova is an Assistant Professor of Computer and Information Science at the University of Pennsylvania. Her main areas of research are automatic text summarisation, affect recognition and text quality. She obtained her PhD degree in Computer Science from Columbia University in 2006. She also spent a year and a half as a postdoctoral fellow at Stanford University before joining Penn in Fall 2007. Ani and her collaborators are recipients of the  best student paper award at SIGDial in 2010 and best paper award at EMNLP in 2012.  She received an NSF CAREER award in 2010. The Penn team co-led by Ani won the audio-visual emotion recognition challenge (AVEC) for word-level prediction in 2012. Ani was a member of the editorial board of Computational Linguistics (2009--2011) and has served as an area chair/senior program committee member for ACL (2013, 2012, 2010), NAACL (2010, 2007), AAAI (2013) and IJCAI (2011).

This talk is organized by Jimmy Lin