ARTÍCULO
TITULO

Survey of multi­armed bandit algorithms applied to recommendation systems

Elena Gangan    
Milos Kudus    
Eugene Ilyushin    

Resumen

The main goal of this paper is to introduce the reader to the multi­armed bandit algorithms of different types and to observe how the industry leveraged them in advancing recommendation systems. We present the current state of the art in RecSys and then explain what multi­armed bandits can bring to the table. First, we present the formalization of the multi­armed bandit problem and show the most common bandit algorithms such as upper confidence bound, Thompson Sampling, epsilon greedy, EXP3. Expanding on that knowledge, we review some important contextual bandits and present their benefits and usage in different applications. In this setting, context means side information about the users or the items of the problem. We also survey various techniques in multi­armed bandits that make bandits fit to the task of recommendations better; namely we consider bandits with multiple plays, multiple objective optimizations, clustering and collaborative filtering approaches. We also assess bandit backed recommendation systems implemented in the industry ­ at Spotify, Netflix, Yahoo and others. At the same time we discuss methods of bandit evaluation and present an empirical evaluation of some notorious algorithms. We conduct short experiments on 2 datasets to show how different policies compare to each other and observe the importance of parameter tuning. This paper is a survey of the multi armed bandit algorithms and their applications to recommendation systems.