Finite-sample Analysis of Greedy-GQ with Linear Function Approximation
under Markovian Noise
- URL: http://arxiv.org/abs/2005.10175v1
- Date: Wed, 20 May 2020 16:35:19 GMT
- Title: Finite-sample Analysis of Greedy-GQ with Linear Function Approximation
under Markovian Noise
- Authors: Yue Wang and Shaofeng Zou
- Abstract summary: This paper develops the first finite-sample analysis for the Greedy-GQ algorithm.
Our paper extends the finite-sample analyses of two timescale reinforcement learning learning algorithms.
- Score: 23.62008807533706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Greedy-GQ is an off-policy two timescale algorithm for optimal control in
reinforcement learning. This paper develops the first finite-sample analysis
for the Greedy-GQ algorithm with linear function approximation under Markovian
noise. Our finite-sample analysis provides theoretical justification for
choosing stepsizes for this two timescale algorithm for faster convergence in
practice, and suggests a trade-off between the convergence rate and the quality
of the obtained policy. Our paper extends the finite-sample analyses of two
timescale reinforcement learning algorithms from policy evaluation to optimal
control, which is of more practical interest. Specifically, in contrast to
existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and
TDC, where their objective functions are convex, the objective function of the
Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also
not a linear two-timescale stochastic approximation algorithm. Our techniques
in this paper provide a general framework for finite-sample analysis of
non-convex value-based reinforcement learning algorithms for optimal control.
Related papers
- Deterministic Trajectory Optimization through Probabilistic Optimal Control [3.2771631221674333]
We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
arXiv Detail & Related papers (2024-07-18T09:17:47Z) - Reinforcement Learning with Unbiased Policy Evaluation and Linear
Function Approximation [11.345796608258434]
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes.
We analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via at least squares at each iteration.
The second algorithm involves a two-time-scale approximation algorithm taking several steps of gradient descent towards the least squares solution.
arXiv Detail & Related papers (2022-10-13T20:16:19Z) - Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast as finite-time error.
Our analysis provides for faster convergence step-sizes for choosing step-sizes.
arXiv Detail & Related papers (2022-09-06T15:04:57Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint [8.087699764574788]
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning context.
Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme.
arXiv Detail & Related papers (2021-01-06T17:06:42Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Finite-Sample Analysis of Proximal Gradient TD Algorithms [43.035055641190105]
We analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.
Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP.
The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
arXiv Detail & Related papers (2020-06-06T20:16:25Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.