Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity
- URL: http://arxiv.org/abs/2006.03976v1
- Date: Sat, 6 Jun 2020 21:04:21 GMT
- Title: Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity
- Authors: Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan,
Marek Petrik
- Abstract summary: We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms.
We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
- Score: 40.73281056650241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce proximal gradient temporal difference learning,
which provides a principled way of designing and analyzing true stochastic
gradient temporal difference learning algorithms. We show how gradient TD (GTD)
reinforcement learning methods can be formally derived, not by starting from
their original objective functions, as previously attempted, but rather from a
primal-dual saddle-point objective function. We also conduct a saddle-point
error analysis to obtain finite-sample bounds on their performance. Previous
analyses of this class of algorithms use stochastic approximation techniques to
prove asymptotic convergence, and do not provide any finite-sample analysis. We
also propose an accelerated algorithm, called GTD2-MP, that uses proximal
``mirror maps'' to yield an improved convergence rate. The results of our
theoretical analysis imply that the GTD family of algorithms are comparable and
may indeed be preferred over existing least squares TD methods for off-policy
learning, due to their linear complexity. We provide experimental results
showing the improved performance of our accelerated gradient TD methods.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - Gradient Descent Temporal Difference-difference Learning [0.0]
We propose descent temporal difference-difference (Gradient-DD) learning in order to improve GTD2, a GTD algorithm.
We study the model empirically on the random walk task, the Boyan-chain task, and the Baird's off-policy counterexample.
arXiv Detail & Related papers (2022-09-10T08:55:20Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Leveraging Non-uniformity in First-order Non-convex Optimization [93.6817946818977]
Non-uniform refinement of objective functions leads to emphNon-uniform Smoothness (NS) and emphNon-uniform Lojasiewicz inequality (NL)
New definitions inspire new geometry-aware first-order methods that converge to global optimality faster than the classical $Omega (1/t2)$ lower bounds.
arXiv Detail & Related papers (2021-05-13T04:23:07Z) - Simple and optimal methods for stochastic variational inequalities, II:
Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise.
A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Finite-Sample Analysis of Proximal Gradient TD Algorithms [43.035055641190105]
We analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.
Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP.
The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
arXiv Detail & Related papers (2020-06-06T20:16:25Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.