Talks

Supporting User-Centered Analytical Interfaces at Scale

Dixin Tang

IRB 4105, Zoom Link-https://umd.zoom.us/j/92977540316?pwd=NVF2WTc5SS9RSjFDOGlzcENKZnNxQT09

Wednesday, March 1, 2023, 11:00 am-12:00 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

Many data analytical tools are built for the general public to easily make sense of data and get insights, such as spreadsheets, visual analytical tools, and many Python data analysis libraries. These tools are widely adopted by people with no or limited programming experience. Their popularity is mainly attributed to their intuitive and easy-to-use interfaces, referred to as user-centered analytical interfaces. Unfortunately, in face of a large dataset, the modern data analytical stack that supports these interfaces suffers from significant problems with interactivity, scalability, and resource utilization.

In this talk, I will present my research on transforming the modern data analytical stack to efficiently support user-centered analytical interfaces at scale. I will focus on the two projects that address the interactivity and scalability problems. First, I will present transactional panorama, a formal framework that enables end-users to consume the results in progressively updating visualizations with desirable properties (e.g., coherence) and performance preserved. Transactional panorama extends database transactions to model the user’s interaction with progressively updating visualizations and opens a research direction that brings transactions into end-user analytics. After, I will discuss the decomposition rules for parallelizing the execution of pandas, a popular Python data analysis library. These decomposition rules are adapted from traditional parallel execution techniques to consider the new data model, API, and access patterns in pandas. Finally, I will discuss future projects that bring large-scale data analysis to the masses.

Bio

Dixin Tang is a postdoctoral scholar at UC Berkeley working with Prof. Aditya G. Paramaswaran. Prior to that, he received his Ph.D. degree from the University of Chicago, advised by Prof. Aaron J. Elmore. His research is broadly in data management with a focus on building usable, scalable, and resource-efficient data systems for the general public to easily analyze large-scale datasets.

This talk is organized by Richa Mathur