Course on Reinforcement Learning

Rules for the homework

• You can work on it in pairs.
• The deadline is strict. Any delay within 6 hours will receive a penalty of -1 points. Within 24 hours, the penalty is -2 and then -5. The penalties are intended over 20.
• Each homework assigns 3.5 points.
• The submission should by done by email with “[EC]” in the subject.
• The submission should be the code and a detailed report (that can be generated automatically from Matlab comments).
• Both the correctness of the code and the quality of the report will be taken into consideration in the evaluation.

Rules for the homework

Proposed papers to review

=== Application to computer games ===

Extended form games (poker)

Other games

=== Advertising and recommendation ===

=== Education ===

=== Finance ===

•John Moody and Matthew Saffell. Learning to trade via direct reinforcement, 2001
•"Censored Exploration and the Dark Pool Problem"
•“Reinforcement Learning for Optimized Trade Execution”
•Beomsoo Park and Benjamin Van Roy. Adaptive execution: Exploration and learning of price impact

=== Other applications (some of them are quite old) ===

=== Other topics in RL ===

•Deep Reinforcement Learning: an Overview (it also has a lot of pointers to applications)
•Inverse reinforcement learning [1]
•Exploration vs exploration [1] [2]
•A Survey on Policy Search for Robotics

Rules for the presentations

• Choose two papers (unless you select one very long)
• The review can be done in pairs
• Presentations with slides are of 25 min (MAX) and they should be balanced between the two people
• Register your papers and presentation slot on https://goo.gl/yk5nFv

Approximate dynamic programming with addendum