log in  |  register  |  feedback?  |  help
Logo
Handling the Three Vs of Big Data with DILoS
Tuesday, May 14, 2013, 11:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

For the past few years, our group has been working on problems related
to Big  Data through several projects. After  briefly discussing these
projects, the rest  of this talk will present  DILoS, which focuses on
three  of  the five  Big  Data's Vs,  that  is,  volume, velocity  and
variability.

Today, the  ubiquity of sensing devices  as well as of  mobile and web
applications continuously  generates a huge amount of  data which takes
the form  of streams.  These  data streams are  typically high-volume,
often high-velocity (speed) and  high-variability (bursty).  It is not
only the rates,  but also the value distribution  of data streams that
usually  fluctuates in  an unpredictable fashion. In order to meet the
near-real-time requirements of the  monitoring applications and of the
emerging  ``Big   Data''  applications,   data  streams  need   to  be
continuously processed  and analyzed. Such processing happens
inside Data  stream management systems (DSMSs), which efficiently
support continuous queries (CQs).

CQs  inherently  have  different   levels  of  criticality  and  hence
different levels of  expected quality of service (QoS)  and quality of
data (QoD).   In order to provide different  quality guarantees, i.e.,
service   level  agreements   (SLAs),  to   different   client  stream
applications, we developed DILoS,  a novel framework that exploits the
synergy  between scheduling and  load shedding  in DSMS.   In overload
situations, DILoS enforces worst-case response times for all CQs while
providing prioritized QoD, i.e.,  minimize data loss for query classes
according  to  their  priorities.  We  further propose  ALoMa,  a  new
adaptive load manager scheme that enables the realization of the DILoS
framework.   ALoMa is  a  general, practical  DSMS  load shedder  that
outperforms the state-of-the-art in deciding when the DSMS is overload
and how much load needs to  be shed.  We implemented DILoS in our real
DSMS  prototype system  (AQSIOS) and  evaluate its  performance  for a
variety of  real and synthetic  workloads.  Our experiments  show that
our  framework   (1)  allows  the   scheduler  and  load   shedder  to
consistently honor  CQs' priorities and (2)  maximizes the utilization
of the system processing capacity to reduce load shedding.

Bio

Panos K. Chrysanthis is a Professor of Computer Science and the
founding director of the Advanced Data Management Technologies
Laboratory (ADMT Lab) [http://db.cs.pitt.edu] at the University of
Pittsburgh. His lab has a broad focus on user-centric data management
for scalable network-centric and collaborative applications and has
fostered interdisciplinary collaborations between computer science,
medicine and astronomy, both within and outside the University of
Pittsburgh -- he is an Adjunct Professor at the Carnegie Mellon
University and at the University of Cyprus, Cyprus. In 1995, he
received one of the first NSF CAREER Awards for his pioneer work on
mobile data management and in 2010, he was recognized as a
Distinguished Scientist by ACM. In 2007, he was also elevated to the
level of a Senior Member of IEEE.

DILoS was developed in collaboration with  Thao N. Pham (as part of her
PhD thesis) and Alexandros Labrinidis who is the co-director of the ADMT
lab. This work has been funded in part by two NSF Awards and a gift from
EMC/Greenplum.

This talk is organized by Amol