Constructs---like inflation, populism, or paranoia---are of fundamental concern to social science. Constructs are the vocabulary over which theory operates, and so a central activity is the development and measurement of latent constructs from observable data. Although the social sciences comprise fields with different epistemological norms, they share a concern for valid operationalizations that transparently map between data and measurement. Economists at the US Bureau of Labor Statistics, for example, follow a hundred-page handbook to sample the egg prices that constitute CPI-U; Clinical psychologists rely on suites of psychometric tests to diagnose schizophrenia.
In many fields, this observable data takes the form of language: as a social phenomenon, language data can encode many of the latent social constructs that social scientists care about. Commensurate with both increasing sophistication in language technologies and amounts of available data, there has thus emerged a "text-as-data" paradigm aimed at "amplifying and augmenting" the analyses that contribute to research. At the same time, Natural Language Processing (NLP), the field from which analysis tools originate, tends to remain separate from real-world problems and guiding theories.
This proposal focuses on NLP methods and evaluations that facilitate two core activities in the social sciences: the development and measurement of latent constructs from natural language. These efforts remain sensitive to needs for interpretability and validity. Existing work includes new methods to facilitate the inductive conceptualization of constructs, as well as the validation of existing methods in the context of this use case. Ongoing and proposed work focuses on the human-centered measurement of constructs in text.
Alexander Hoyle is a PhD student in Computer Science at the University of Maryland, advised by Philip Resnik. His research is focused on the development and evaluation of NLP methods for computational social science.