log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Building the Next Generation of Data-Intensive Systems: From Complex Event Processing to Large-Scale Analytics
Barzan Mozafari - MIT
Friday, February 22, 2013, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Databases have been a successful abstraction for accessing and managing data in traditional workloads. However, the rapid growth of data and the demand for more complex analytics have significantly hindered the scalability and applicability of these systems beyond classic business data processing scenarios. In my talk, I will explain how my research addresses these two challenges. First, I will introduce a system that I have built for supporting complex event processing over both stored and streaming data. This system extends existing database query languages with minimal but powerful constructs that enable a wide range of advanced applications, such as high-frequency trading, click-stream analysis, and the analysis of function-call traces. Using the recently proposed Visibly Pushdown Automata as the underlying model of this system, I will present several optimization techniques for efficient implementation of these languages, leading to higher throughput than its predecessors by several orders of magnitude. In the second part of my talk, I will turn to the scalability challenges, and briefly introduce a parallel query engine called BlinkDB that enables interactive, ad-hoc queries over massive volumes of data in a MapReduce cluster. I will demonstrate how BlinkDB employs sophisticated optimization and sampling strategies to achieve sub-second latency on tens of terabytes to petabytes of data.



Bio

Barzan Mozafari is currently a Postdoctoral Associate at Massachusetts Institute of Technology. He earned his PhD in Computer Science from the University of California at Los Angeles. He is passionate about building practical, large-scale data-intensive systems, with a particular interest in database-as-a-service, distributed systems, and the integration of machine learning and crowdsourcing into database systems. He has received several fellowships and awards, including SIGMOD 2012's best paper award for his work on high-performance complex event processing.

 

This talk is organized by Adelaide Findlay