Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity
- URL: http://arxiv.org/abs/2006.03976v1
- Date: Sat, 6 Jun 2020 21:04:21 GMT
- Title: Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity
- Authors: Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan,
Marek Petrik
- Abstract summary: We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms.
We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
- Score: 40.73281056650241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce proximal gradient temporal difference learning,
which provides a principled way of designing and analyzing true stochastic
gradient temporal difference learning algorithms. We show how gradient TD (GTD)
reinforcement learning methods can be formally derived, not by starting from
their original objective functions, as previously attempted, but rather from a
primal-dual saddle-point objective function. We also conduct a saddle-point
error analysis to obtain finite-sample bounds on their performance. Previous
analyses of this class of algorithms use stochastic approximation techniques to
prove asymptotic convergence, and do not provide any finite-sample analysis. We
also propose an accelerated algorithm, called GTD2-MP, that uses proximal
``mirror maps'' to yield an improved convergence rate. The results of our
theoretical analysis imply that the GTD family of algorithms are comparable and
may indeed be preferred over existing least squares TD methods for off-policy
learning, due to their linear complexity. We provide experimental results
showing the improved performance of our accelerated gradient TD methods.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.