log in  |  register  |  feedback?  |  help  |  web accessibility
PhD Defense: Algorithms and Data Structures for Faster Nearest-Neighbor Classification
Alejandro Flores-Velazco
Monday, May 9, 2022, 10:00 am-12:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)
Given a set P of n labeled points in a metric space (X,d), the nearest-neighbor rule classifies an unlabeled query point q in X with the class of q's closest point in P. Despite the advent of more sophisticated techniques, nearest-neighbor classification is still fundamental for many machine-learning applications. Over the years, this has motivated numerous research aiming to reduce its high dependency on the size and dimensionality of the data. This dissertation presents various approaches to reduce the dependency of the nearest-neighbor rule from n to some smaller parameter k, that describes the intrinsic complexity of the class boundaries of P. This is of particular significance as it is usually assumed that k is much less than n on real-world training sets.

One natural way to achieve this dependency reduction is to reduce the training set itself, selecting a subset R of P to be used by the nearest-neighbor rule to answer incoming queries, instead of using P. Essentially, reducing its dependency from n, the size of P, to the size of R. We propose different techniques to select R, all of which select subsets whose sizes are proportional to k, and have varying degrees of correct classification guarantees.

Another alternative involves building data structures designed to answer these classification queries, bypassing the preprocessing step of reducing P. We propose the Chromatic AVD to answer approximate nearest-neighbor classification queries, and whose query times and storage requirements dependent on k_epsilon, which describes the intrinsic complexity of the epsilon-approximate class boundaries of P.

Examining Committee:
Dean's Representative:
Dr. David M. Mount                              
Dr. Yancy Diaz-Mercado 
Dr.  MohammadTaghi HajiAghayi
Dr. John P. Dickerson
Dr.  Hanan Samet

Dr. Samir Khuller

Alejandro Flores-Velazco is a CS PhD candidate at the University of Maryland, College Park, advised by Prof. David Mount. His research interests lie generally in Theoretical Computer Science, including Computational Geometry, Approximation Algorithms, and Data Structures. He received his Bachelor's degree from Universidad Simon Bolivar (Venezuela), and will be joining Google (NYC) this summer.

This talk is organized by Tom Hurst