Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling
- URL: http://arxiv.org/abs/2002.06286v2
- Date: Mon, 17 Aug 2020 15:28:43 GMT
- Title: Non-asymptotic Convergence of Adam-type Reinforcement Learning
Algorithms under Markovian Sampling
- Authors: Huaqing Xiong, Tengyu Xu, Yingbin Liang, Wei Zhang
- Abstract summary: This paper provides the first theoretical convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning.
Under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of $mathcalO(log T/sqrtT)$.
Under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of $mathcalO(log T/sqrtT
- Score: 56.394284787780364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the wide applications of Adam in reinforcement learning (RL), the
theoretical convergence of Adam-type RL algorithms has not been established.
This paper provides the first such convergence analysis for two fundamental RL
algorithms of policy gradient (PG) and temporal difference (TD) learning that
incorporate AMSGrad updates (a standard alternative of Adam in theoretical
analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover,
our analysis focuses on Markovian sampling for both algorithms. We show that
under general nonlinear function approximation, PG-AMSGrad with a constant
stepsize converges to a neighborhood of a stationary point at the rate of
$\mathcal{O}(1/T)$ (where $T$ denotes the number of iterations), and with a
diminishing stepsize converges exactly to a stationary point at the rate of
$\mathcal{O}(\log^2 T/\sqrt{T})$. Furthermore, under linear function
approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood
of the global optimum at the rate of $\mathcal{O}(1/T)$, and with a diminishing
stepsize converges exactly to the global optimum at the rate of
$\mathcal{O}(\log T/\sqrt{T})$. Our study develops new techniques for analyzing
the Adam-type RL algorithms under Markovian sampling.
Related papers
- MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning.
Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions.
We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z) - Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions.
We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z) - A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous
Q-Learning and TD-Learning Variants [39.28675942566465]
This paper develops a framework to study finite-sample convergence guarantees of a class of value-based asynchronous RL algorithms.
As a by-product, we provide theoretical insights into the bias-variance trade-off, i.e., efficiency of bootstrapping in RL.
arXiv Detail & Related papers (2021-02-02T15:48:19Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.