ent possibilities—the issues to focus on (and/or exclude), the aspects to highlight within
any issue, the narratives to include, and more. These choices, deliberate or not, are socially
structured. The ever-increasing availability of unstructured large-scale textual data, in part
due to the bulk of communication and information dissemination happening in online or
digital spaces, makes natural language processing (NLP) techniques a natural fit in help-
ing understand socially-situated choices (communicative choices) using that textual data.
Within NLP methods, unsupervised NLP methods are often needed since large-scale textual data in the wild often does not have accompanying labels, and any existing labels or
categorization may not fit be appropriate for answering specific research questions.
This proposal seeks to address the following question: how can we use unsupervised
NLP methods to study texts authored by specific people or institutions in order to effectively explicate the communicative choices being made as well as investigate their potential motivations, context-based variation, and consequences?
Our first set of contributions centers on methodological innovation. We focus on topic
modeling—a class of generally unsupervised NLP methods that can automatically discover author’s communicative choices in the form of topics or categorical themes present
in a collection of documents. We introduce a new neural topic model (NTM) that effectively incorporates contextualizing sequential knowledge. Next, we find critical gaps in
the near-universal automated evaluation paradigm that compares different models in the
topic modeling methods research, and we then operationalize different evaluation criteria which are grounded in the needs of the well-defined use case of content analysis. The latter two works call into question much of the recent work in NTM development claiming
“state-of-the-art” and emphasize the importance of validating the outputs of NLP methods.
To use unsupervised NLP methods to investigate potential motivations, context-based
variation, and consequences of communicative choices, we link textual data with information about the authors, social contexts, and media involved in their production, and use these connected information sources to help conduct empirical research in social sciences.
In our second set of contributions, we analyze a previously unexplored connection between a politician’s donors and their communicative choices in their floor speeches to show
how donations influence issue-attention in US Congress, enabling a new look at money in
politics and providing an example of studying motivations behind communicative choices.
Our third set of contributions uses text-based ideal point extraction to better understand
the role of institutional constraints and audience considerations in the varying expression
and ideological positioning of politicians. Domain experts validate and annotate modeling outputs to establish the reliability of the automated tool. Proposed work will extend
the existing text-based ideal point extraction tool, validate our new method, and use it for
empirical research on the impact of issue-context on ideological frames.
In our fourth set of contributions, we demonstrate the potential of both unsupervised
NLP techniques and social network data and methods in better understanding the downstream consequences of communicative choices by focusing on misinformation narratives in mainstream media, viewing and highlighting misinformation as something beyond just false claims published by certain bad actors.
Our final piece of proposed work will use our experiences with diverse kinds of data
and methods to make our fifth set of contributions: a way of finding and analyzing perspectives not present in (or excluded from) one particular discourse. Specifically, we will
propose a new method and create a new dataset to find cases where certain themes and
frames (communicative choices) present in the public discourse (social media, open-ended
surveys, etc.) on a specific issue are not present or given attention to in elite discourse
(government communiques, mainstream news media, scientific literature).
Pranav Goel (he/him) is a 5th-year Ph.D. student in Computer Science at the University of Maryland, College Park. He is advised by Prof. Philip Resnik as part of the Computational Linguistics and Information Processing (CLIP) Lab, often working with collaborators (especially political and social scientists) from other departments and labs both at and outside UMD. His research lies at the intersection of Natural Language Processing and Computational Social Science. He looks at how NLP methods can be used to study texts authored by specific people or institutions in order to effectively explicate the socially structured communicative choices being made, as well as investigate the potential motivations, context-based variation, and consequences of those choices. An underlying theme of his work is analyzing discourse in the form of text data to help hold elite institutions accountable.