Course on Reinforcement Learning

Lecture 6: Sample Complexity of ADP Algorithms

Lecture 5: Approximate Dynamic Programming

Lecture 3: Reinforcement Learning Algorithms

Lecture 2: Markov Decision Processes and Dynamic Programming

Lecture 0: Introduction to the Course

Lecture 1: Introduction to Reinforcement Learning

Abstract

Introduction to the models and mathematical tools used in formalizing the problem of learning and decision-making under uncertainty. In particular, we will focus on the frameworks of reinforcement learning and multi-arm bandit. The main topics studied during the course are:

-Historical multi-disciplinary basis of reinforcement learning

-Markov decision processes and dynamic programming

-Stochastic approximation and Monte-Carlo methods

-Function approximation and statistical learning theory

-Approximate dynamic programming

-Introduction to stochastic and adversarial multi-arm bandit

-Learning rates and finite-sample analysis

Where and When

The course on “Reinforcement Learning” will be held at the Department of Mathematics at ENS Cachan. The course will be held every Tuesday from October 1st to December 17th in C103 (C109 for practical sessions) from 11:00 to 13:00.

Schedule

• 01/10 -- Markov Decision Processes
• 08/10 -- Dynamic Programming
• 15/10 -- Reinforcement Learning
• 22/10 -- Practical session on Dynamic Programming and Reinforcement Learning
• 29/10 -- Multi-armed Bandit (1)
• 05/11 -- Practical session on Multi-armed Bandit
• 12/11 -- Multi-armed Bandit (2)
• 19/11 -- Practical session on Multi-armed Bandit
• 26/11 -- Approximate Dynamic Programming
• 03/12 -- Sample Complexity of Approximate Dynamic Programming
• 10/12 -- Practical session on ADP
• 17/12 -- Guest lectures + Internships proposals
• 14/01 -- Evaluation

Lectures

News

• PRESENTATION DAY: Wednesday 22nd, starting at 9:15. Full schedule schedule.pdf. Presentation format: 15min+5min. Attendance requested ONLY for your own presentation.

• DEADLINE EXTENSION: submission of the report JANUARY 15th. Further information about the room and the schedule of the presentations will be available soon.
• A nice tool to simulate value iteration, policy iteration, and Q-learning is available on http://www.cs.cmu.edu/~awm/rlsim/
• The files to simulate the behavior of UCB are here: ucb.m plot_arms.m
• The initial list of projects is available mini-projects.pdf.
• The (tentative) timeline for the projects is: Dec 17th all assignments done, Jan 8th submission of the report, Jan 21st presentation session.

Lecture 4: The Multi-arm Bandit Problem