Responsible data stewardship has largely been an afterthought over the last two decades as new techniques and tools were being rapidly developed to harness the potential of "big data". However, increased scrutiny by regulatory bodies (resulting in regulations like GDPR, CCPA, etc.), high-profile data breaches, and wide-spread use of data-driven processes to make life-altering decisions, have brought to the forefront the issues of transparency, trust, and responsible and ethical usage of data.
In this talk, I will discuss of some of the new data management challenges that have emerged as a result of these developments, focusing in particular on the need to build novel privacy-first database systems to operationalize privacy-by-design principles. I will discuss some preliminary work that uses pseudonymization and synthetic data generation to transparently rearchitect a relational database system to achieve a variety of privacy goals, and research challenges moving forward. I will then briefly discuss other recent research projects in my group, including our prior and ongoing work on graph databases, and on building a unified provenance and metadata management system to support data science lifecycle management.
Amol Deshpande is a Professor in the Department of Computer Science at the University of Maryland with a joint appointment in the University of Maryland Institute for Advanced Computer Studies (UMIACS). He received his Ph.D. from the University of California at Berkeley in 2004. His research interests include collaborative data science platforms, provenance, privacy, uncertain data management, adaptive query processing, data streams, graph analytics, and sensor networks. He is a recipient of an NSF Career award, and has received best paper awards at the VLDB 2004, EWSN 2008, and VLDB 2009 conferences. He is also a Co-Founder and Chief Scientist at WireWheel, Inc., which is building a comprehensive platform to help companies comply with data privacy regulations like GDPR, CCPA, and others.