log in  |  register  |  feedback?  |  help  |  web accessibility
Adversarial Bandits with Knapsacks
Karthik Abinav Sankararaman
Friday, March 1, 2019, 1:00-2:00 pm Calendar
  • You are subscribed to this talk through .
  • You are watching this talk through .
  • You are subscribed to this talk. (unsubscribe, watch)
  • You are watching this talk. (unwatch, subscribe)
  • You are not subscribed to this talk. (watch, subscribe)

In this talk, we will discuss the multi-armed bandits problem with resource constraints, in the adversarial setting. In this problem, we have an interactive and repeated game between the algorithm and an adversary. Given T time-steps, d resources, m actions and budgets B1, B2, .. Bd, the algorithm chooses one of the m actions at each time-step. An adversary then reveals a reward and consumption for each of the d resources, corresponding to this action. The time-step at which the algorithm runs out of the d resources (i.e., the total consumption for resource j > Bj), the game stops and the total reward is the sum of rewards obtained until the stopping time. The goal is to maximize the competitive ratio; the ratio of the total reward of the algorithm to the expected reward of a fixed distribution that knows the rewards and consumption for all actions and time-steps beforehand. We give an algorithm for this problem whose competitive ratio is tight (matches the lower-bound). The algorithm achieves the optimal bound, both in expectation and with high-probability. Moreover, the algorithmic framework is modular and can be used in a black-box manner to obtain algorithms to many related problems such as the stochastic, contextual, semi-bandit settings as well as to the more general online constrained convex optimization setting.

The talk is based on this (https://arxiv.org/abs/1811.11881) paper.

This talk is organized by Karthik Abinav Sankararaman