Course on Reinforcement Learning



Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:

-Historical multi-disciplinary basis of reinforcement learning

-Markov decision processes and dynamic programming

-Stochastic approximation and Monte-Carlo methods

-Function approximation and statistical learning theory

-Approximate dynamic programming

-Introduction to stochastic and adversarial multi-arm bandit

-Learning rates and finite-sample analysis


  1. 03/10 -- Markov Decision Processes [Salle Condorcet, d’Alembert]

  2. 10/10 -- Dynamic Programming [Salle Condorcet, d’Alembert]

  3. 20/10 -- Reinforcement Learning [Salle Condorcet, d’Alembert]

  4. 24/10 -- Practical session on Dynamic Programming and Reinforcement Learning [Salle Condorcet, d’Alembert]

  5. 31/10 -- Exploration-exploitation: Multi-armed Bandit [Salle Condorcet, d’Alembert]

  6. 9/11 -- Exploration-exploitation: beyond Multi-armed Bandit [Salle Condorcet, d’Alembert]

  7. 16/11 -- Practical session on Multi-armed Bandit [Salle Condorcet, d’Alembert]

  8. 21/11 -- Approximate Dynamic Programming [Salle Condorcet, d’Alembert]

  9. 30/11 -- Policy Search Algorithms and Deep RL [Salle Condorcet, d’Alembert]

  10. 19/12 -- Practical session on ADP [Salle Condorcet, d’Alembert]

  1. Around 10/01/2017 -- Deadline for submission proposals

  2. Around 17/01/2017 -- Presentations


The course will be evaluated according to the points collected in the practical sessions and with a final project. Project proposals, internships, and PhD positions will be announced towards end of October.

New material


  1. Projects page:

  2. Report submission deadline: JANUARY 19th at MIDNIGHT

  3. Presentation day: JANUARY 24th and 25th

  4. Homework page:

  5. New material on exploration-exploitation and on bandit for learning Nash equilibria in two-player zero-sum games (not covered in the course!)