Talks

Scaling Up Learning: from Big Data to Little Features

Wednesday, April 25, 2012, 1:30-2:30 pm

You are subscribed to this talk through .
You are watching this talk through .
You are subscribed to this talk. (unsubscribe, watch)
You are watching this talk. (unwatch, subscribe)
You are not subscribed to this talk. (watch, subscribe)

Abstract

It is a common assumption that dealing with “big data” is the main challenge in scaling up machine learning and prediction tasks. However, training set size is only one of many motivations for developing high-performance learning methods. The talk will illustrate the diversity of efficiency-related problems in machine learning in a brief survey of several canonical algorithms and application scenarios. Then, the talk will discuss a scaling-motivated problem that has not received attention in literature, but is ubiquitous in industrial applications: prediction of new feature relevance. While identifying new informative features is the main pathway for improving accuracy in mature applications, evaluating every potential feature by re-training can be costly computationally, logistically and financially. The talk will describe a principled, learner-independent technique for estimating new feature utility, derived via a connection between the feature’s loss reduction potential and its correlation with the loss gradient, leading to a simple, embarrassingly parallel hypothesis testing procedure.

Bio

Misha Bilenko is a researcher in the Machine Learning Group at Microsoft Research, which he joined after receiving his Ph.D. from the University of Texas at Austin. He is interested in learning algorithms and systems for large-scale behavioral, transactional and textual tasks. Problems on which he worked extensively include entity resolution, semi-supervised clustering, and prediction-related tasks in online advertising. His work has received best paper awards from SIGIR and KDD. He has recently co-edited the “Scaling Up Machine Learning” collection published by Cambridge U. Press.

This talk is organized by Jay