The first decade of data science has largely focused on what *can* be done with large, noisy, heterogeneous, datasets.
The next decade of data science will be characterized by what *should* be done: How do we ensure accountability, fairness, and transparency in algorithmic decision-making to combat rather than reinforce inequities? As we apply these approaches in a social context, how do we ensure the privacy of individuals? As these techniques become increasingly democratized, how do we avoid junk science --- spurious, non-reproducible findings? How do we curate and expose existing data to make them "safe" for useful science?
In this talk, I'll describe some work underway in this space here at UW and elsewhere, and where we need greater investment from the larger data community. I'll focus on the deep curation project, where our aim is to automatically extract claims from scientific papers and validate them against open data to combat the reproducibility crisis in science.
Bill Howe is Associate Professor in the Information School, Adjunct Associate Professor in Computer Science & Engineering, and Associate Director of the UW eScience Institute. His research interests are in data management, curation, analytics, and visualization in the sciences. Howe played a leadership role in the Data Science Environment program at UW through a $32.8 million grant awarded jointly to UW, NYU, and UC Berkeley. With support from the MacArthur Foundation and Microsoft, Howe directs the Urbanalytics group at UW and UW's participation in the Cascadia Urban Analytics Cooperative with the University of British Columbia, where he focuses on data-intensive urban science. He founded the UW Data Science Masters Degree and serves as its inaugural Program Director and Faculty Chair. He has received two Jim Gray Seed Grant awards from Microsoft Research for work on managing environmental data, has had two papers selected for VLDB Journal's "Best of Conference" issues (2004 and 2010), and co-authored what are currently the most-cited papers from both VLDB 2010 and SIGMOD 2012. Howe serves on the program and organizing committees for a number of conferences in the area of databases and scientific data management, developed a first MOOC on data science that attracted over 200,000 students across two offerings, and founded UW's Data Science for Social Good program. He has a Ph.D. in Computer Science from Portland State University and a Bachelor's degree in Industrial & Systems Engineering from Georgia Tech.