Talks

PhD Proposal: Discriminative Interlingual Representations for NLP

Jagadeesh Jagarlamudi - University of Maryland, College Park

3450 A.V. Williams Building (AVW)

Monday, April 16, 2012, 9:30-10:30 am

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

THE PRELIMINARY ORAL EXAMINATION FOR THE DEGREE OF Ph.D. IN COMPUTER SCIENCE FOR

Jagadeesh Jagarlamudi

The language barrier in many of the multilingual natural language processing (NLP) tasks, such as name transliteration, mining bilingual word translations, etc., can be overcome by mapping objects (names and words in the respective tasks) from different languages (or “views”) into a common low-dimensional subspace. Multi-viewmodels learn such a low-dimensional subspace using a training corpus of paired objects, e.g. names written in different languages, represented as feature vectors.

The central idea of my dissertation is to learn low-dimensional subspaces (or interlingual representations) that are effective for various multilingual and monolingual NLP tasks. First, I demonstrate the effectiveness of interlingual representations in mining bilingual word translations, and then proceed to developing models for diverse situations that often arise in NLP tasks. In particular, I design models for the following problem settings: 1) when there are more than two views but we only have training data from a single pivot view into each of the remaining views 2) when an object from one view is associated with a ranked list of objects from another view, and finally 3) when the underlying objects have rich structure, such as a tree.

These problem settings arise frequently in real world applications. I choose a canonical task for each of the settings and compare my model with existing state-of-the-art baseline systems. I provide empirical evidence for the first two models on multilingual name transliteration and reranking for the part-of-speech tagging tasks, respectively. For the third problem setting, I propose to evaluate my model on the compositionality learning task. This task aims to find the meaning, represented as a vector in d-dimensional space, of a sentence or a phrase based on the meaning of its constituent words. The details of optimizing the third model and its evaluation are left for future work, as part of my dissertation.

Examining Committee:

Dr. Hal Daume’ III - Chair

Dr. Lise Getoor - Dept’s Representative

Dr. Jordan Boyd-Graber - Committee Member

Dr. Philip Resnik - Committee Member

EVERYBODY IS INVITED TO ATTEND THE PRESENTATION

This talk is organized by Jeff Foster