Course on Reinforcement Learning

 

Abstract

Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:


-Historical multi-disciplinary basis of reinforcement learning

-Markov decision processes and dynamic programming

-Stochastic approximation and Monte-Carlo methods

-Introduction to stochastic and adversarial multi-arm bandit

-Approximate dynamic programming


Where and When


The course on “Reinforcement Learning” will be held at the Ecole Centrale de Lille. The room for lectures is B7-14 and for the practical sessions is C016.


Schedule


See hyperplanning.

Lectures

News

  1. Text of the first Homework: homework1-tree.pdf

  2. Text of the second Homework: homework2.pdf code.zip

  3. Text of the second Homework: homework3.pdf code.zip

  4. Planning: planning.pdf

Proposed papers to review


Advertising and recommendation

  1. “A Contextual-Bandit Approach to Personalized News Article Recommendation”

  2. Google study on multi-arm bandit for Google Analytics

  3. J. Mary, R. Gaudel, Ph. Preux, Bandits Warm-up Cold Recommender Systems


Games

  1. “Regret Minimization in Games with Incomplete Information”

  2. “Approximate Dynamic Programming Finally Performs Well in the Game of Tetris”

  3. “Playing Atari with Deep Reinforcement Learning”


Finance

  1. John Moody and Matthew Saffell. Learning to trade via direct reinforcement, 2001

  2. "Censored Exploration and the Dark Pool Problem"

  3. “Reinforcement Learning for Optimized Trade Execution”

  4. Beomsoo Park and Benjamin Van Roy. Adaptive execution: Exploration and learning of price impact


Robotics

  1. “Reinforcement Learning in Robotics: A Survey”

  2. “Autonomous inverted helicopter flight via reinforcement learning”


Control for energy management

  1. “Adaptive Stochastic Control for Smart Grids”

  2. “An Intelligent Battery Controller Using Bias-Corrected Q-learning”

  3. Ying Tan, Wei Liu, and Qinru Qiu. Adaptive power management using reinforcement learning


Other control applications

  1. “An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application”

  2. “Reinforcement Learning-based Control of Traffic Lights in Non-stationary Environments”

  3. “Optimizing Dialogue Management with Reinforcement Learning”

  4. “RL-MAC: a reinforcement learning based MAC protocol for wireless sensor networks”

  5. “Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems”

  6. “Reinforcement Learning for Elevator Control”