Tutorial on Bandits Games
In the recent years the multi-armed bandit problem has attracted a lot of attention in the theoretical learning community. This growing interest is a consequence of the large number of problems that can be modeled as a multi-armed bandit: ad placement, website optimization, packet routing, ect. Furthermore the bandit methodology is also used as a building block for more complicated scenarios such as reinforcement learning, model selection in statistics, or computer game-playing. While the basic stochastic multi-armed bandit can be traced back to Thompson (1933) and Robbins (1952), it is only very recently that we obtained an (almost) complete understanding of this simple model. Moreover many extensions of the original problem have been proposed in the past fifteen years, such as bandits without a stochastic assumption (the so-called adversarial model), or bandits with a very large (but structured) set of arms.
This tutorial will cover in details the state-of-the-art for the basic multi-armed bandit problem (both stochastic and adversarial), and the information theoretic analysis of Bayesian bandit problems. We will also touch upon contextual bandits, as well as the case of very large (possibly infinite) set of arms with linearconvexLipschitz losses.
Old version of the slides