ML Seminar

Bayesian exploration for approximate dynamic programming

2012-02-14T11:33:27-05:00

Ilya Ryzhov - University of Maryland
2460 A.V. Williams Building (AVW)
Wednesday, February 29, 2012, 1:30-2:30 pm

Abstract:

Approximate dynamic programming (ADP) is a powerful methodology for solving large-scale, complex stochastic optimization problems in energy, inventory management, finance, transportation, and artificial intelligence. In such problems, each decision carries immediate economic benefits (selling inventory, allocating resources), but it also provides information about the problem that helps us make better decisions in the future. This tradeoff, known as "exploration vs. exploitation," has long been studied in the literature on optimal learning and multi-armed bandits, but much of the work in this area does not easily carry over to ADP. We bridge the gap between optimal learning and ADP using the concept of "value of information." We interpret information as an economic quantity, which can thus be traded off against other economic benefits. The value of information depends on a Bayesian belief about the quality of a decision, but also accounts for the uncertainty inherent in that belief. We show how this approach can be integrated into powerful classes of ADP models such as basis functions and hierarchical aggregation to efficiently optimize the decision-making policy. Both theoretical and experimental results are discussed.

Bio:

Ilya O. Ryzhov is an Assistant Professor of Operations Management and Management Science in the Robert H. Smith School of Business, University of Maryland. He received a Ph.D. in Operations Research and Financial Engineering from Princeton University in 2011. He is also the co-author (with W.B. Powell) of Optimal Learning, published in 2012 by John Wiley and Sons. His research deals with efficient information collection in application areas such as energy, disaster relief, and operations management.

This talk is part of the following lists: ML Seminar

Tractable market making in combinatorial prediction markets

2012-02-14T11:35:45-05:00

Miroslav Dudik - Yahoo! Research
2460 A.V. Williams Building (AVW)
Wednesday, March 14, 2012, 1:30-2:30 pm

Abstract:

Prediction markets are emerging as a powerful and accurate method of aggregating information from populations of experts (and non-experts). Traders in prediction markets are incentivized to reveal their information through buying and selling "securities" for events such as "Romney to win the primary in Illinois". The prices of securities reflect the aggregate belief about the events and the key challenge is to correctly price the securities.

We present a new automated market maker for providing liquidity across multiple logically interrelated securities. Our approach lies somewhere between the industry standard---treating related securities as independent and thus not transmitting any information from one security to another---and a full combinatorial market maker for which pricing is computationally intractable. Our market maker, based on convex optimization and constraint generation, is tractable like independent securities yet propagates some information among related securities like a combinatorial market maker, resulting in more complete information aggregation. Our techniques borrow heavily from variational inference in exponential families. We prove several favorable properties of our scheme and evaluate its information aggregation performance on survey data involving hundreds of thousands of complex predictions about the 2008 U.S. presidential election.

Joint work with Sebastien Lahaie and David Pennock.

Bio:

Miroslav Dudik is a research scientist at Yahoo! Labs. His interests are in combining theoretical and applied aspects of machine learning, statistics, convex optimization and algorithms. He received his PhD from Princeton in 2007.

This talk is part of the following lists: ML Seminar

Detecting Sentiment and Identifying Influence in Social Media

2012-03-28T13:40:51-04:00

Prem Melville - IBM Research, Machine Learning Group
2460 A.V. Williams Building (AVW)
Wednesday, April 11, 2012, 1:30-2:30 pm

Abstract:

The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies that are increasingly concerned about monitoring the discussion around their products. As such, marketing organizations need to be aware of what people are saying in influential blogs, how the expressed opinions could impact their business, and how to extract business insight and value from these blogs. This has given rise to the emerging discipline of Social Media Analytics, which draws from Social Network Analysis, Machine Learning, Data Mining, Information Retrieval, and Natural Language Processing. This talk discusses two fundamental challenges in the analysis of social media – detecting sentiment and identifying influence in networks.

Sentiment Analysis focuses on the task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this talk, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. This work has led to the formulation of a general Machine Learning framework called Dual Supervision, where classifiers can be built using both example labels and “feature labels.”

Much work in Social Network Analysis has focused on the identification of the most important actors in a social network. This has resulted in several measures of influence, authority, centrality or prestige. Most of such sociometrics (e.g., PageRank) are driven by intuitions based on an actor’s location in a network. It is our position that asking for the “most influential” actors is an ill-posed question, unless it is put in context with a specific measurable task. Constructing a predictive task of interest in a given domain provides a mechanism to quantitatively compare different measures of influence. Furthermore, when we know what type of actionable insight to gather, we need not rely on a single network prestige measure. A combination of measures is more likely to capture various aspects of the social network that are predictive and beneficial for the task. In order to do this, we introduce supervised rank aggregation techniques and show the benefits of locally-optimal order-based rank aggregation. We illustrate these ideas through a case study on a data set of 40 million Twitter users, where we study measures of influence in the context of predicting when users will be rebroadcasted.

Bio:

Prem Melville received a Ph.D. in Computer Science at the University of Texas at Austin, and is currently a Research Scientist in the Machine Learning Group at IBM Research. His research interests lie in Machine Learning and Data Mining. More specifically, he has worked on active learning, ensemble methods, active feature-value acquisition, sentiment analysis, dual supervision, rank aggregation, recommender systems, text classification, and applications of data mining to social media analytics, business analytics and e-commerce. Recently, Prem has served on the organizing committees of CIKM 2012, ICML 2011, MLG 2011, KDD 2010 and WSDM 2010. He also organized the first workshop on Budgeted Learning at ICML 2010, and the first workshop on Social Media Analytics (SOMA 2010) at KDD, and Mining and Learning with Graphs (MLG 2011). Prem also serves on the Editorial Board of Data Mining and Knowledge Discovery. He received the Best Application Paper Award at KDD 2010, and has won KDD Cup 2009, KDD Cup 2008 and the INFORMS Data Mining Contest 2008.

This talk is part of the following lists: AI ⋅ ML Seminar

Inference and Search for Graphical Models

2012-04-12T09:03:39-04:00

Rina Dechter - UC Irvine
3258 A.V. Williams Building (AVW)
Thursday, April 12, 2012, 1:00-2:00 pm

Abstract:

Graphical models, e.g., Bayesian networks, Markov random fields, constraint networks and influence diagrams, are knowledge representation schemes that capture independencies in the knowledge base and support efficient, graph-based algorithms for a variety of reasoning tasks. Their applications include scheduling, planning, diagnosis and situation assessment, design, and hardware and software verification. Algorithms for reasoning in graphical models are of two primary types: inference-based (e.g., variable-elimination, join-tree clustering) and search-based. Exact inference-based algorithms are exponentially bounded (both time and space) by the tree-width of the graph. Search algorithms that explore an AND/OR search space can accommodate a more flexible time and memory tradeoff but their performance can also be bounded exponentially by the tree-width.

In my talk I will present and contrast the two primary types of reasoning algorithms and subsequently will focus on bounded inference approximations such as belief propagation and mini-bucket elimination. In particular and as time permits I will show the gain obtained from a hybrid of search and inference, using mini-bucket lower-bounds heuristics to guide AND/OR search, and will comment on how we can transition to approximation scheme using graph-based AND/OR sampling.

Bio:

Rina Dechter is a professor of Computer Science at the University of California, Irvine. She received her PhD in Computer Science at UCLA in 1985, an MS degree in Applied Mathematic from the Weizmann Institute and a B.S in Mathematics and Statistics from the Hebrew University, Jerusalem. Her research centers on computational aspects of automated reasoning and knowledge representation including search, constraint processing and probabilistic reasoning.

Professor Dechter is an author of Constraint Processing published by Morgan Kaufmann, 2003, has authored over 100 research papers, and has served on the editorial boards of: Artificial Intelligence, the Constraint Journal, Journal of Artificial Intelligence Research and Logical Method in Computer Science (LMCS). She was awarded the Presidential Young investigator award in 1991, is a fellow of the American association of Artificial Intelligence since 1994, was a Radcliffe Fellowship 2005-2006 and received the 2007 Association of Constraint Programming (ACP) research excellence award.

This talk is part of the following lists: AI ⋅ CLIP ⋅ CS Department ⋅ DBChat ⋅ ML Seminar

Scaling Up Learning: from Big Data to Little Features

2012-04-18T12:28:35-04:00

Misha Bilenko - Microsoft Research
1146 A.V. Williams Building (AVW)
Wednesday, April 25, 2012, 1:30-2:30 pm

Abstract:

It is a common assumption that dealing with “big data” is the main challenge in scaling up machine learning and prediction tasks. However, training set size is only one of many motivations for developing high-performance learning methods. The talk will illustrate the diversity of efficiency-related problems in machine learning in a brief survey of several canonical algorithms and application scenarios. Then, the talk will discuss a scaling-motivated problem that has not received attention in literature, but is ubiquitous in industrial applications: prediction of new feature relevance. While identifying new informative features is the main pathway for improving accuracy in mature applications, evaluating every potential feature by re-training can be costly computationally, logistically and financially. The talk will describe a principled, learner-independent technique for estimating new feature utility, derived via a connection between the feature’s loss reduction potential and its correlation with the loss gradient, leading to a simple, embarrassingly parallel hypothesis testing procedure.

Bio:

Misha Bilenko is a researcher in the Machine Learning Group at Microsoft Research, which he joined after receiving his Ph.D. from the University of Texas at Austin. He is interested in learning algorithms and systems for large-scale behavioral, transactional and textual tasks. Problems on which he worked extensively include entity resolution, semi-supervised clustering, and prediction-related tasks in online advertising. His work has received best paper awards from SIGIR and KDD. He has recently co-edited the “Scaling Up Machine Learning” collection published by Cambridge U. Press.

This talk is part of the following lists: AI ⋅ ML Seminar

Modeling Individual and Population traits from Clinical Temporal Data

2012-04-26T14:19:14-04:00

Suchi Saria - Johns Hopkins University
2460 A.V. Williams Building (AVW)
Wednesday, May 2, 2012, 1:30-2:30 pm

Abstract:

Physiological data are routinely recorded in intensive care, but their use for rapid assessment of illness severity has been limited. The data is high-dimensional, noisy, and changes rapidly; moreover, small changes that occur in a patient's physiology over long periods of time are difficult to detect, yet can lead to catastrophic outcomes. A physician’s ability to recognize complex patterns across these high-dimensional measurements is limited. We propose a nonparametric Bayesian method for discovering informative representations in such continuous time series that aids both exploratory data analysis and feature construction. When applied to data from premature infants in the neonatal ICU (NICU), our model obtains novel clinical insights. Based on these insights, we devised the Physiscore, a novel risk prediction score that combines patterns from continuous physiological signals to predict infants at risk for developing major complications in the NICU. Using only 3 hours of non-invasive data from birth, Physiscore very successfully predicts morbidity in preterm infants. Physiscore performed consistently better than other neonatal scoring systems, including the Apgar, which is the current standard of care, and SNAP, a machine learning based score that requires multiple invasive tests. This work was published on the cover of Science Translational Medicine (Science's new journal aimed at translational medicine work), and was covered by numerous press sources.

Bio:

Suchi Saria received her PhD'11 in Computer Science with a machine learning and clinical informatics focus from Stanford University with Daphne Koller as her mentor. Saria is an Assistant Professor at Johns Hopkins University in the departments of Computer Science within the school of Engineering and Health Policy within the Bloomberg school of Public Health. She is also visiting Harvard Medical School as an NSF Computing Innovation fellow this year. She has won various awards including, a Best Student Paper and a Best Student Paper Finalist award, the Rambus Fellowship, the Microsoft full scholarship and the National Science Foundation Computing Innovation Fellowship. Her research interests lie in developing novel machine learning and data-driven solutions for improving health care delivery both at the point of care and for analysis by policy makers. Her thesis work has been featured in national and international press outlets including CBS Radio, France's national newspaper Le Monde and NIH's Medline Plus.

This talk is part of the following lists: AI ⋅ ML Seminar

The Wisdom of Crowds, Delivered to the Masses

2012-05-04T08:59:19-04:00

David Pennock - Microsoft Research
1146 A.V. Williams Building (AVW)
Wednesday, May 9, 2012, 1:30-2:30 pm

Abstract:

I will describe our efforts to create a hub for predictions, data, games, and infographics about the 2012 US Presidential election and other news, sports, and entertainment topics. As part of the project were are building a "fantasy politics" game: a true combinatorial prediction market with over a quadrillion outcomes where you can buy and sell nearly anything about the election from "Obama will win both Florida and Ohio" to "There will be a path of blue from Canada to Mexico". Read more at http://bit.ly/combopmfreak

Bio:

David Pennock is a Principal Researcher and Assistant Managing Director of Microsoft Research in New York City, where he leads a group focused on algorithmic economics. He has over sixty academic publications relating to computational issues in electronic commerce and the web, including papers in PNAS, Science, IEEE Computer, Theoretical Computer Science, Algorithmica, Electronic Commerce Research, Electronic Markets, AAAI, EC, WWW, KDD, UAI, SIGIR, ICML, NIPS, INFOCOM, SAINT, ACM SIGCSE, and VLDB. He has authored two patents and ten patent applications. In 2005, he was named to MIT Technology Review's list of 35 top technology innovators under age 35. Prior to his current position at Yahoo!, Dr. Pennock worked as a research scientist at NEC Laboratories America, a research intern at Microsoft Research, and in 2001 served as an adjunct professor at Pennsylvania State University. He received a Ph.D. in Computer Science from the University of Michigan, an M.S. in Computer Science from Duke University, and a B.S. in Physics from Duke. Dr. Pennock's work has been featured in Discover Magazine, New Scientist, CNN, the New York Times, the Economist, Surowiecki’s "The Wisdom of Crowds", and several other publications.

This talk is part of the following lists: AI ⋅ ML Seminar

Better Learning and Inference with Dependency Networks

2013-01-24T17:51:32-05:00

Daniel Lowd - University of Oregon
3258 A.V. Williams Building (AVW)
Monday, January 28, 2013, 1:00-2:00 pm

Abstract:

Bayesian and Markov networks have been widely successful for

learning and reasoning in domains with uncertainty, but each has

limitations. Dependency networks are an alternative graphical

representation with more flexibility than Bayesian networks and more

efficient learning methods than Markov networks. The disadvantages of

dependency networks are that they may represent inconsistent

probability distributions and few inference algorithms are applicable.

In this talk, I will show how we can improve the utility of dependency

networks with new learning and inference algorithms. First, I will

show how mean field inference can be a faster alternative to Gibbs

sampling, even in inconsistent dependency networks. Second, I will

show how dependency networks can be used to learn better Markov

networks in less time, compared to several state-of-the-art methods.

Finally, I will introduce a new method for directly converting an

inconsistent dependency network into a consistent Markov network.

Based on joint work with Jesse Davis and Arash Shamaei.

Bio:

Daniel Lowd is an Assistant Professor in the Department of Computer

and Information Science at the University of Oregon. His research

interests include learning and inference with probabilistic graphical

models, adversarial machine learning, and statistical relational

machine learning. He maintains Libra, an open-source toolkit for

Learning and Inference in Bayesian networks, Random fields, and

Arithmetic circuits.

This talk is part of the following lists: ML Seminar

Tutorial on Deep Learning with Apache MXNet Gluon

2017-09-20T16:42:12-04:00

Alex Smola - Professor of Computer Science, Carnegie Mellon University, Director of Machine Learning and Deep Learning at AWS, Amazon
4172 A.V. Williams Building (AVW)
Friday, September 22, 2017, 1:00-3:00 pm

Abstract:

This tutorial introduces Gluon, a flexible new interface that pairs MXNet’s speed with a user-friendly frontend. Symbolic frameworks like Theano and TensorFlow offer speed and memory efficiency but are harder to program. Imperative frameworks like Chainer and PyTorch are easy to debug but they can seldom compete with the symbolic code when it comes to speed. Gluon reconciles the two, removing a crucial pain point by using just-in-time compilation and an efficient runtime engine for efficiency.

In this crash course, we’ll cover deep learning basics, the fundamentals of Gluon, advanced models, and multiple-GPU deployments. We will walk you through MXNet’s NDArray data structure and automatic differentiation tools. Well show you how to define neural networks at the atomic level, and through Gluon’s predefined layers. We’ll demonstrate how to serialize models and build dynamic graphs. Finally, we will show you how to hybridize your networks, simultaneously enjoying the benefits of imperative and symbolic deep learning.

This talk is part of the following lists: CS Department ⋅ ML Seminar