Talks

Place-based Information Systems: Textual Location Identification and Visualization

Hanan Samet - University of Maryland, College Park

2117 Computer Science Instructional Center (CSI)

Friday, November 30, 2012, 1:00-2:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

The popularity of web-based mapping services such as Google Earth/Maps and Microsoft Virtual Earth (Bing), has led to an increasing awareness of the importance of location data and its incorporation into both web-based search applications and the databases that support them, In the past, attention to location data had been primarily limited to geographic information systems (GIS), where locations correspond to spatial objects and are usually specified geometrically. However, in the web-based applications, the location data often corresponds to place names and is usually specified textually.

An advantage of such a specification is that the same specification can be used regardless of whether the place name is to be interpreted as a point or a region. Thus the place name acts as a polymorphic data type in the parlance of programming languages. However, its drawback is that it is ambiguous. In particular, a given specification may have several interpretations, not all of which are names of places. For example, ``Jordan'' may refer to both a person as well as a place. Moreover, there is additional ambiguity when the specification has a place name interpretation. For example, ``Jordan'' can refer to a river or a country while there are a number of cities named ``London''.

In this talk we examine the extension of GIS concepts to textually specified location data and review search engines that we have developed to retrieve documents where the similarity criterion is not based solely on exact match of elements of the query string but instead also based on spatial proximity. Thus we want to take advantage of spatial synonyms so that, for example, a query seeking a rock concert in College Park would be satisfied by a result finding a rock concert in Hyattsville or Greenbelt. This idea has been applied by us to develop the STEWARD (Spatio-Textual Extraction on the Web Aiding Retrieval of Documents) system for finding documents on website of the Department of Housing and Urban Development. This system relies on the presence of a document tagger that automatically identifies spatial references in text, pdf, word, and other unstructured documents. The thesaurus for the document tagger is a collection of publicly available data sets forming a gazetteer containing the names of places in the world. Search results are ranked according to the extent to which they satisfy the query, which is determined in part by the prevalent spatial entities that are present in the document. The same ideas have also been adapted by us to collections of news articles as well as Twitter tweets resulting in the NewsStand and TwitterStand systems, respectively, which will be demonstrated along with the STEWARD system in conjunction with a discussion of some of the underlying issues that arose and the techniques used in their implementation. Future work involves applying these ideas to spreadsheet data.

Bio

Hanan Samet (http://www.cs.umd.edu/~hjs/) is a Professor of Computer Science at the University of Maryland, College Park and is a member of the Institute for Computer Studies. He is also a member of the Computer Vision Laboratory at the Center for Automation Research where he leads a number of research projects on the use of hierarchical data structures for database applications involving spatial data. He has a Ph.D from Stanford University. His doctoral dissertation dealt with proving the correctness of translations of LISP programs which was the first work in translation validation. He is the author of the recent book "Foundations of Multidimensional and Metric Data Structures" published by Morgan-Kaufmann, San Francisco, CA, in 2006 (http://www.mkp.com/multidimensional), an award winner in the 2006 best book in Computer and Information Science competition of the Professional and Scholarly Publishers (PSP) Group of the American Publishers Association (AAP), and of the first two books on spatial data structures titled "Design and Analysis of Spatial Data Structures" and "Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS" published by Addison-Wesley, Reading, MA, 1990. He is the Founding Editor-In-Chief of the ACM Transactions on Spatial Algorithms and System (TSAS), the founding chair of ACM SIGSPATIAL, a recipient of the 2009 UCGIS Research Award, 2011 ACM Paris Kanellakis Theory and Practice Award, the 2010 CMPS Board of Visitors Award at the University of Maryland, a Fellow of the ACM, IEEE, AAAS, and IAPR (International Association for Pattern Recognition), and an ACM Distinguished Speaker. He received best paper awards in the 2008 SIGMOD Conference and the 2008 SIGSPATIAL ACMGIS'08 Conference, and a best demo award at the 2011 SIGSPATIAL ACMGIS'11 Conference. His paper at the 2009 IEEE International Conference on Data Engineering (ICDE) was selected as one of the best papers for publication in the IEEE Transactions on Knowledge and Data Engineering.

This talk is organized by Jeff Foster