Course on Reinforcement Learning


Lecture 0: Introduction to the Course


Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:

-	Historical multi-disciplinary basis of reinforcement learning
-	Markov decision processes and dynamic programming
-	Stochastic approximation and Monte-Carlo methods
-	Function approximation and statistical learning theory
-	Approximate dynamic programming
-	Introduction to stochastic and adversarial multi-arm bandit
-	Learning rates and finite-sample analysis

Where and When

The course on “Reinforcement Learning” will be held at the Department of Mathematics at ENS Cachan. The course will be held every Tuesday from September 29th to December 15th from 11:00 to 13:00.


The schedule may change in the coming weeks.

 29/09 -- Markov Decision Processes [Salle Conférence, Pavillon des Jardins]
 06/10 -- Dynamic Programming [Salle Condorcet, d’Alembert]
 13/10 -- Reinforcement Learning [Salle Condorcet, d’Alembert]
 20/10 -- Practical session on Dynamic Programming and Reinforcement Learning [Salle Condorcet, d’Alembert]
 27/10 -- Multi-armed Bandit (1) [Salle Condorcet, d’Alembert]
 03/11 -- Practical session on Multi-armed Bandit [Amphi Curie, d’Alembert]
 10/11 -- Multi-armed Bandit (2) and announcement and assignment of projects [Amphi Curie, d’Alembert]
 17/11 [MORNING] -- Practical session on Multi-armed Bandit [Salle Condorcet, d’Alembert]
 17/11 [AFTERNOON 2:45pm-4:45pm] -- Multi-armed Bandit (2) and announcement and assignment of projects [Salle Condorcet, d’Alembert]
 24/11 -- Approximate Dynamic Programming [Salle Condorcet, d’Alembert]
 01/12 -- Sample Complexity of Approximate Dynamic Programming [Salle Condorcet, d’Alembert]
 15/12 -- Practical session on ADP [Salle Condorcet, d’Alembert]
 15/12 -- Guest lecture [Salle Condorcet, d’Alembert]


The course will be evaluated according to the points collected in the practical sessions and with a final project. Project proposals, internships, and PhD positions will be announced towards mid-November.


** Slides will be uploaded before each class (otherwise see slides from last year).

** The lecture notes are a bit outdated now, if you want to look at them refer to the material from last year.


  1. Changes in the schedule: Class on 10/11 is canceled and it is moved to 17/11 in the afternoon from 2:45pm to 6:45pm in Salle Condorcet, while the session of 17/11 is confirmed in the morning from 11am to 1pm as usual.

  2. First round of project proposals is available here. The proposals will be updated/integrated in the coming days.

  3. Changes in the schedule: On 1/12 we will have lecture, the class of 8/12 is canceled, while the last class on 15/12 will be the last TP of the course.

Lecture 1: A Bit of History

Lecture 2: MDP and Dynamic Programming

Lecture 3: Reinforcement Learning Algorithms

Lecture 4: The Multi-Armed Bandit Framework

Lecture 5: Approximate Dynamic Programming