Knowledge bases and the Web of Linked Data have become important assets for search, recommendation, and analytics. Natural-language questions are a user-friendly mode of tapping this wealth of knowledge. We present a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function.
We then present an extension of this work aimed at making our approach more robust to account for the variety of ways users can phrase questions and the inherent incompleteness of knowledge bases. The extension allows questions to be partially translated into relaxed queries, covering the essential but not necessarily all aspects of the user's input. To compensate for the omissions, we exploit textual sources associated with entities and relational facts. Our system translates user questions into an extended form of structured SPARQL queries, with text predicates attached to triple patterns.
Mohamed Yahya is a doctoral student at the Max Planck Institute for Informatics and Saarland University in Germany. His research interests are at the intersection of NLP, IR and IE. His research focuses on improving the usability of semantic knowledge bases by extending traditional querying paradigms and coming up with new ones to facilitate access to the wealth of information in such knowledge bases. He received his M.Sc. in Computer Science from Saarland University in 2010 and his B.Eng. in Computer Systems Engineering from Birzeit University in 2008. For more information please visit http://mpii.de/~myahya