Gradient Temporal Difference with Momentum: Stability and Convergence
- URL: http://arxiv.org/abs/2111.11004v1
- Date: Mon, 22 Nov 2021 06:17:11 GMT
- Title: Gradient Temporal Difference with Momentum: Stability and Convergence
- Authors: Rohan Deb, Shalabh Bhatnagar
- Abstract summary: We decompose the heavy ball Gradient TD algorithm into three separate iterates with different step sizes.
We prove that the heavy ball Gradient TD algorithm is convergent using our three-timescale SA analysis.
- Score: 4.780898954294901
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient temporal difference (Gradient TD) algorithms are a popular class of
stochastic approximation (SA) algorithms used for policy evaluation in
reinforcement learning. Here, we consider Gradient TD algorithms with an
additional heavy ball momentum term and provide choice of step size and
momentum parameter that ensures almost sure convergence of these algorithms
asymptotically. In doing so, we decompose the heavy ball Gradient TD iterates
into three separate iterates with different step sizes. We first analyze these
iterates under one-timescale SA setting using results from current literature.
However, the one-timescale case is restrictive and a more general analysis can
be provided by looking at a three-timescale decomposition of the iterates. In
the process, we provide the first conditions for stability and convergence of
general three-timescale SA. We then prove that the heavy ball Gradient TD
algorithm is convergent using our three-timescale SA analysis. Finally, we
evaluate these algorithms on standard RL problems and report improvement in
performance over the vanilla algorithms.
Related papers
- Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic
Approximation with Markovian Noise [9.82187447690297]
We characterize a tight convergence bound for the iterations of linear two-time-scale approximation (SA) with Markovian noise.
Our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
arXiv Detail & Related papers (2023-12-31T01:30:14Z) - Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms.
We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants.
Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z) - Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm.
In this paper, we remove the necessity of a full gradient.
The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Multi-kernel Passive Stochastic Gradient Algorithms and Transfer
Learning [21.796874356469644]
The gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated.
The algorithm performs substantially better in high dimensional problems and incorporates variance reduction.
arXiv Detail & Related papers (2020-08-23T11:55:19Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity [40.73281056650241]
We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms.
We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
arXiv Detail & Related papers (2020-06-06T21:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.