log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Query Processing Challenges for Linked Data
Maria Esther Vidal - Universidad Simon Bolivar
Monday, August 26, 2013, 2:00-3:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Emerging technologies that support networks of sensors, scientific annotation graphs or social networks are making available extremely large volumes of linked data that can be naturally represented as graphs. In the context of Semantic Web, scalable RDF engines have been defined to store and consume
RDF graphs.  The most efficient RDF engines and their query processing algorithms reply on physical operators and storage structures that exploit the properties of the (RDF) graphs to efficiently consume data that is locally stored. However, because of the size of existing linked datasets, loading and processing the data and their links locally is not feasible.
Different approaches have been proposed to access federations of linked data using SPARQL endpoints.  In this talk, I will describe two query processing strategies and the performance behavior of existing query engines. In the first part of the talk, ANAPSID will be presented. ANAPSID is an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions when linked data is remotely accessed. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data traffic is bursty, and opportunistically, ANAPSID operators produce results as quickly as linked data arrives from the endpoints. ANAPSID performance will be compared with respect to state-of-the-art federated engines. Experimental results show that ANAPSID can speed up execution time, in some cases, by more than one order of magnitude. In the second part of the talk, I will focus on different RDF and graph database engines, and the performance
of these engines executing graph-oriented tasks including reachability,  traversal, adjacency, pattern matching, densest subgraph and graph summarization, on a variety of RDF graphs.

Results of this research have been presented in ISWC 2011, 2012, tutorials at ESWC 2012 and 2013 and demonstrations at ISWC 2013.

Bio

Mar?a-Esther Vidal is a Full Professor of the Computer Science department and Assistant Dean for Research in Applied Science and Engineering at the Universidad Simón Bolívar, Caracas, Venezuela. Her research in information management covers information integration, federated databases, graph data management, Linked Open Data and the Semantic Web. Maria-Esther has addressed some of the most important challenges in selecting and modeling sources, rewriting queries, cost based optimization, graph query processing and optimization,
benchmarks for federated SPARQL query processing, etc. Her proposed strategies have had significant impact from the early days of information integration in the Web, in the late 90s, to the emergence of the semantic Web and SPARQL endpoints, to the more recent successes of Linked Open Data. She has published her research results in the premier conferences and journals in Database Management,
Artificial Intelligence and the Semantic Web.

This talk is organized by Amol