log in  |  register  |  feedback?  |  help  |  web accessibility
Logo
Hierarchical Clustering for Everyone
Friday, October 29, 2021, 1:00-2:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Abstract

Compared to the highly successful flat clustering methods (e.g. k-means), despite its important role in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem. This was enabled by a breakthrough by Dasgupta, who introduced a formal optimization objective into the study of hierarchical clustering.

In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable to both arbitrary and high-dimensional vector data, including embedding vectors arising from deep learning. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce 1) a new projection-based approximation algorithm for vector data, 2) a new partitioning-based algorithm for arbitrary data. I will also discuss scalable implementations using projected gradient descent and experimental results on large-scale vector embedding datasets from deep learning and other methods. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.

Bio

Grigory Yaroslavtsev (http://grigory.ai) is an assistant professor of George Mason University. He works on foundational questions in scalable algorithms for machine learning and data science. His work has been supported by NSF CRII Award and Facebook Faculty Research Award. Before joining GMU Grigory was the founding director of the Center for Algorithms and Machine Learning at Indiana University, Bloomington where he was an assistant professor of Computer Science and an adjunct assistant professor of Statistics (by courtesy). Grigory held a visiting position at the Alan Turing Institute (London, UK) and postdoctoral fellowships at the Warren Center for Network and Data Sciences at the University of Pennsylvania (joint at CIS and Wharton Stat) and at Brown University, ICERM. Grigory received his Ph.D. in theoretical computer science in 2014 from Penn State.

This talk is organized by sharmila