Finite-Sample Analysis of Proximal Gradient TD Algorithms
- URL: http://arxiv.org/abs/2006.14364v2
- Date: Fri, 3 Jul 2020 14:51:47 GMT
- Title: Finite-Sample Analysis of Proximal Gradient TD Algorithms
- Authors: Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik
- Abstract summary: We analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.
Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP.
The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
- Score: 43.035055641190105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we analyze the convergence rate of the gradient temporal
difference learning (GTD) family of algorithms. Previous analyses of this class
of algorithms use ODE techniques to prove asymptotic convergence, and to the
best of our knowledge, no finite-sample analysis has been done. Moreover, there
has been not much work on finite-sample analysis for convergent off-policy
reinforcement learning algorithms. In this paper, we formulate GTD methods as
stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective
function, and then conduct a saddle-point error analysis to obtain
finite-sample bounds on their performance. Two revised algorithms are also
proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence
guarantees and acceleration, respectively. The results of our theoretical
analysis show that the GTD family of algorithms are indeed comparable to the
existing LSTD methods in off-policy learning scenarios.
Related papers
- Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation [5.152147416671501]
This paper analyzes multi-step TD-learning algorithms characterized by linear function approximation, off-policy learning, and bootstrapping.
Two n-step TD-learning algorithms are proposed and analyzed, which can be seen as the model-free reinforcement learning counterparts of the gradient and control theoretic algorithms.
arXiv Detail & Related papers (2024-02-24T10:42:50Z) - Gradient Descent Temporal Difference-difference Learning [0.0]
We propose descent temporal difference-difference (Gradient-DD) learning in order to improve GTD2, a GTD algorithm.
We study the model empirically on the random walk task, the Boyan-chain task, and the Baird's off-policy counterexample.
arXiv Detail & Related papers (2022-09-10T08:55:20Z) - First-Order Algorithms for Nonlinear Generalized Nash Equilibrium
Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs)
Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method.
We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint [8.087699764574788]
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning context.
Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme.
arXiv Detail & Related papers (2021-01-06T17:06:42Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Proximal Gradient Temporal Difference Learning: Stable Reinforcement
Learning with Polynomial Sample Complexity [40.73281056650241]
We introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true gradient temporal difference learning algorithms.
We show how gradient TD reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function.
arXiv Detail & Related papers (2020-06-06T21:04:21Z) - Finite-sample Analysis of Greedy-GQ with Linear Function Approximation
under Markovian Noise [23.62008807533706]
This paper develops the first finite-sample analysis for the Greedy-GQ algorithm.
Our paper extends the finite-sample analyses of two timescale reinforcement learning learning algorithms.
arXiv Detail & Related papers (2020-05-20T16:35:19Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.