One of the foundational goals of artificial intelligence is to build intelligent agents which interact with humans, and to do so, they must have the capacity to infer from human communication what concept is being referred to in a span of symbols. Furthermore, they should be able, like humans, to map these representations to perceptual inputs, visual or otherwise. In NLP, this problem of discovering which spans of text are referring to the same real-world entity or event is called Coreference Resolution. While coreference resolution is a constrained NLP problem, this dissertation expands its scope to go beyond text and map concepts referred to by text spans to concepts represented in the visual domain. This dissertation also investigates the complex and hard nature of real world coreference resolution. Lastly, this dissertation goes beyond discovering entity concepts referred by contiguous spans of text to discovering prototypical concepts which are present in a distribution over a block of text.
A central theme throughout this thesis is the paucity of data in solving hard problems of discovering and mapping concepts, which this work addresses. To investigate hard text coreference this dissertation analyses a domain of questions which are coreference heavy, namely questions present in the trivia game of quiz bowl. Solving quiz bowl questions requires robust coreference resolution ability, and world knowledge, something humans possess but current models do not. I solve this problem by incorporating distributional semantics into simple models. Also, I focus on solving the sub-problem of mention detection, which has been hitherto ignored due to paucity of coreference resolution datasets annotated with singletons. Next, to investigate complex visual representations of concepts, this dissertation uses the domain of paintings. Mapping spans of text in descriptions of paintings to regions of paintings being described by that text is a non-trivial problem because paintings are sufficiently harder to solve than natural images. This is also solved with the help of distributional semantics. This problem, like hard coreference, also suffers from the lack of annotated datasets of paintings. Finally, in order to discover prototypical concepts present in distributed rather than contiguous spans of text, this dissertation investigates a source of text which is rich in prototypical concepts, namely movie scripts. All movie narratives, character arcs, and character relationships, are distilled to sequences of interconnected prototypical concepts which are discovered using unsupervised deep learning models, also using distributional semantics. I conclude this dissertation by discussing potential future research in downstream tasks which can be aided by discovery of referring multi-modal concepts.
Chair: Dr. John Aloimonos
Dean’s rep: Dr. Philip Resnik
Members: Dr. Jordan Boyd-Graber
Dr. Cornelia Fermuller
Dr. Hal Daume