Approximate dynamic programming (ADP) is a powerful methodology for solving large-scale, complex stochastic optimization problems in energy, inventory management, finance, transportation, and artificial intelligence. In such problems, each decision carries immediate economic benefits (selling inventory, allocating resources), but it also provides information about the problem that helps us make better decisions in the future. This tradeoff, known as "exploration vs. exploitation," has long been studied in the literature on optimal learning and multi-armed bandits, but much of the work in this area does not easily carry over to ADP. We bridge the gap between optimal learning and ADP using the concept of "value of information." We interpret information as an economic quantity, which can thus be traded off against other economic benefits. The value of information depends on a Bayesian belief about the quality of a decision, but also accounts for the uncertainty inherent in that belief. We show how this approach can be integrated into powerful classes of ADP models such as basis functions and hierarchical aggregation to efficiently optimize the decision-making policy. Both theoretical and experimental results are discussed.
Ilya O. Ryzhov is an Assistant Professor of Operations Management and Management Science in the Robert H. Smith School of Business, University of Maryland. He received a Ph.D. in Operations Research and Financial Engineering from Princeton University in 2011. He is also the co-author (with W.B. Powell) of Optimal Learning, published in 2012 by John Wiley and Sons. His research deals with efficient information collection in application areas such as energy, disaster relief, and operations management.