This talk will cover recent advances, both theoretical and practical, in high-dimensional approximate nearest neighbor search. First, we introduce partition-based methods such as locality-sensitive hashing, which work by partitioning a high-dimensional point set into multiple smaller cells and querying only a small number of those cells. Under reasonable assumptions, these methods provide theoretical guarantees as well as practical performance on some smaller datasets. Then, we cover the techniques that have far outperformed methods with theoretical guarantees--namely, graph-based ANNS with a greedy search strategy--and present extensive experimental evidence proving this point. We will conclude by highlighting current gaps between theory and practice and pointing out open theoretical problems inspired by those gaps.
Magdalen Dobson Manohar is a 5th year PhD student at Carnegie Mellon University advised by Guy Blelloch and supported by an NSF GRFP fellowship. She is interested in designing parallel and concurrent algorithms for solving problems related to similarity search, information retrieval, and computing nearest neighbors, with a particular focus on similarity search in high dimensions. Much of her recent work can be found in the ParlayANN repository, a benchmark suite of high-dimensional nearest neighbor algorithms along with a set of useful tools for designing such algorithms.