log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Data quality: the other face of big data
Wednesday, April 29, 2015, 4:00-5:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

In our Big Data era, data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth “V” of big data is increasingly being recognized.

In this talk, we highlight the substantial challenges that the first three “V”s, volume, velocity and variety, bring to dealing with veracity in big data. Due to the sheer volume and velocity of data, one needs to understand and (possibly) repair erroneous data in a scalable and timely manner.  With the variety of data, often from a diversity of sources, data quality rules cannot be specified a priori; one needs to let the “data to speak for itself” in order to discover the semantics of the data.  This talk presents recent results that are relevant to big data quality management, focusing on the two major dimensions of (i) discovering quality issues from the data itself, and (ii) trading-off accuracy vs efficiency.

Bio

Divesh Srivastava is the head of Database Research at AT&T Labs-Research. He received his Ph.D. from the University of Wisconsin, Madison, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India. He is a Fellow of the Association for Computing Machinery (ACM), on the board of trustees of the VLDB Endowment, the managing editor of the Proceedings of the VLDB Endowment (PVLDB), and an associate editor of the ACM Transactions on Database Systems (TODS). He has served as associate Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (TKDE), as the Co-chair of the Program Committees of many international conferences including VLDB 2007 and ICDE 2015 (Industrial), and as General Co-chair of SIGMOD 2013. He has presented keynote talks at several international conferences, including VLDB 2010. His research interests and publications span a variety of topics in data management.

This talk is organized by Amol