Course on Reinforcement Learning

 

Rules for the homework

  1. You can work on it in pairs.

  2. The deadline is strict. Any delay within 6 hours will receive a penalty of -1 points. Within 24 hours, the penalty is -2 and then -5. The penalties are intended over 20.

  3. Each homework assigns 3.5 points.

  4. The submission should by done by email with “[EC]” in the subject.

  5. The submission should be the code and a detailed report (that can be generated automatically from Matlab comments).

  6. Both the correctness of the code and the quality of the report will be taken into consideration in the evaluation.

Rules for the homework

  1. Text of the first Homework: homework1-tree.pdf

Proposed papers to review


=== Application to computer games ===


Extended form games (poker)

  1. Regret Minimization in Games with Incomplete Information

  2. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

  3. An Introduction to Counterfactual Regret Minimization

Other games

  1. Mastering the game of Go with deep neural networks and tree search

  2. Human-level control through deep reinforcement learning (and the debate here)

  3. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris


=== Advertising and recommendation ===


  1. A Contextual-Bandit Approach to Personalized News Article Recommendation

  2. Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees

  3. Cascading Bandits for Large-Scale Recommendation Problems

  4. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation

  5. A Multiple-Play Bandit Algorithm Applied to Recommender Systems


=== Education ===


  1. Multi-Armed Bandits for Intelligent Tutoring Systems

  2. Offline Policy Evaluation Across Representations with Applications to Educational Games

  3. Trading Off Scientific Knowledge and User Learning with Multi-Armed Bandits

  4. Multi-Armed Bandit Problem and Its Applications in Intelligent Tutoring Systems [This is just a master thesis]


=== Finance ===


  1. John Moody and Matthew Saffell. Learning to trade via direct reinforcement, 2001

  2. "Censored Exploration and the Dark Pool Problem"

  3. “Reinforcement Learning for Optimized Trade Execution”

  4. Beomsoo Park and Benjamin Van Roy. Adaptive execution: Exploration and learning of price impact


=== Other applications (some of them are quite old) ===


  1. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

  2. Reinforcement Learning for Electric Power System Decision and Control

  3. An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application

  4. Reinforcement Learning-based Control of Traffic Lights in Non-stationary Environments

  5. Optimizing Dialogue Management with Reinforcement Learning

  6. RL-MAC: a reinforcement learning based MAC protocol for wireless sensor networks

  7. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems

  8. Reinforcement Learning for Elevator Control

  9. “Reinforcement Learning in Robotics: A Survey”

  10. “Autonomous inverted helicopter flight via reinforcement learning”

  11. “Adaptive Stochastic Control for Smart Grids”

  12. “An Intelligent Battery Controller Using Bias-Corrected Q-learning”

  13. Ying Tan, Wei Liu, and Qinru Qiu. Adaptive power management using reinforcement learning


=== Other topics in RL ===


  1. Deep Reinforcement Learning: an Overview (it also has a lot of pointers to applications)

  2. Inverse reinforcement learning [1]

  3. Exploration vs exploration [1] [2]

  4. A Survey on Policy Search for Robotics

Rules for the presentations

  1. Choose two papers (unless you select one very long)

  2. The review can be done in pairs

  3. Presentations with slides are of 25 min (MAX) and they should be balanced between the two people

  4. Register your papers and presentation slot on https://goo.gl/yk5nFv

Approximate dynamic programming with addendum