Many data science projects have team members from multiple disciplines with complementary backgrounds, including domain experts with limited IT skills and computer scientists who lack domain knowledge. Typically they rely on tools such as Github, Google Drive, or even email attachments for sharing code and data files, which are very inefficient. In this talk we present our effort of supporting collaborative data analytics to enable a user experience similar to those provided by Google Docs for shared editing and Overleaf for paper writing. We present our open source system called Texera, which has been under development in the past six years. It provides collaboration-oriented features such as GUI-based workflows using cloud services, shared editing, shared execution, version control, commenting, debugging, and multiple languages (e.g., Python and R). Given the increasing importance of machine learning in data science, Texera has rich features to support ML-related analysis. We will discuss technical challenges related to these features and our solutions. The system has been used by more than 200 people to conduct more than 60 data projects on various topics. We will also share our vision of developing a data science community for a broad audience.
Chen Li is a professor in the Department of Computer Science at UC Irvine. He received his Ph.D. degree in Computer Science from Stanford University, and his M.S. and B.S. in Computer Science from Tsinghua University, China, respectively. He was a recipient of an NSF CAREER award and several test-of-time publication awards, a part-time visiting research scientist at Google, PC co-chair of VLDB 2015, an ACM distinguished member, and an IEEE fellow. Since January 2020, he’s the treasurer and a board member of the VLDB Endowment. He was a co-founder and CTO of a startup to commercialize his research results.