Where: SEQUEL team – INRIA Lille Nord Europe
Superviser: Rémi Munos, Email: remi.munos@inria.fr,Keywords:
Reinforcement Learning, Random Projections, Compressed Sensing, Least Squares Temporal Difference, Bellman residual minimization.
Subject:
We will consider the
problem of reinforcement learning in high-dimensional function
approximation spaces when the number of features is bigger than the
number of samples. In particular, we will study the Least-Squares
Temporal Difference (LSTD) and Bellman Residual Minimization (BRM)
learning algorithms combined with random projections of the data
onto a low-dimensional random feature space. The low dimensional
space is defined as the span of a small number of linear combinations
of all the initial features weighted by random i.i.d. weights. We will
use the (somehow surprising) property that inner-products between
vectors (as well as their norm) are almost preserved through random
projections (this is a concentration-of-measure phenomenon called
the Johnson-Lindenstrauss Lemma [1]), thus performing least-squares
regression in the "compressed domain" (low-dimensional space) is almost
as good as in the initial high dimensional space, but enables to reduce
the estimation error (and the numerical complexity), see [2,3].
Other information:
This Master internship will be funded by SEQUEL Team.
Possibility to submit the work to an international conference in the field of Statistics / Machine Learning
Possibility to attend french or international events (ex: ICML 2012, COLT 2012).
References
[1] S. Dasgupta and A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma, Technical report 99–006, U. C. Berkeley, March 1999. See http://www-cse.ucsd.edu/~dasgupta/papers/jl-tr.ps
[2] O. Maillard and R. Munos. Compressed least squares regression. In Proceedings of Advances in Neural Information Processing Systems, 2009. http://hal.archives-ouvertes.fr/inria-00419210/en
[3] O. A. Maillard and R. Munos. Scrambled objects for least-squares regression. In Advances in Neural Information Processing Systems, 2010. http://hal.archives-ouvertes.fr/inria-00483014/fr/
[4] M. Ghavamzadeh, A. Lazaric, O. A. Maillard, and R. Munos. LSTD with random projections. In Advances in Neural Information Processing Systems, 2010. http://chercheurs.lille.inria.fr/~ghavamza/PUBLICATIONS/nips10.pdf