Finite-Sample Analysis for Two Time-scale Non-linear TDC with General
Smooth Function Approximation
- URL: http://arxiv.org/abs/2104.02836v1
- Date: Wed, 7 Apr 2021 00:34:11 GMT
- Title: Finite-Sample Analysis for Two Time-scale Non-linear TDC with General
Smooth Function Approximation
- Authors: Yue Wang, Shaofeng Zou, Yi Zhou
- Abstract summary: We develop novel techniques to explicitly characterize the finite-sample error bound for the general off-policy setting.
Our approach can be applied to a wide range of T-based learning algorithms with general smooth function approximation.
- Score: 27.149240954363496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal-difference learning with gradient correction (TDC) is a two
time-scale algorithm for policy evaluation in reinforcement learning. This
algorithm was initially proposed with linear function approximation, and was
later extended to the one with general smooth function approximation. The
asymptotic convergence for the on-policy setting with general smooth function
approximation was established in [bhatnagar2009convergent], however, the
finite-sample analysis remains unsolved due to challenges in the non-linear and
two-time-scale update structure, non-convex objective function and the
time-varying projection onto a tangent plane. In this paper, we develop novel
techniques to explicitly characterize the finite-sample error bound for the
general off-policy setting with i.i.d.\ or Markovian samples, and show that it
converges as fast as $\mathcal O(1/\sqrt T)$ (up to a factor of $\mathcal
O(\log T)$). Our approach can be applied to a wide range of value-based
reinforcement learning algorithms with general smooth function approximation.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Min-Max Optimization Made Simple: Approximating the Proximal Point
Method via Contraction Maps [77.8999425439444]
We present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems.
Our work is based on the fact that the update rule of the Proximal Point method can be approximated up to accuracy.
arXiv Detail & Related papers (2023-01-10T12:18:47Z) - Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast as finite-time error.
Our analysis provides for faster convergence step-sizes for choosing step-sizes.
arXiv Detail & Related papers (2022-09-06T15:04:57Z) - A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems.
Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm.
We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z) - Improving the Transient Times for Distributed Stochastic Gradient
Methods [5.215491794707911]
We study a distributed gradient algorithm, called exact diffusion adaptive stepsizes (EDAS)
We show EDAS achieves the same network independent convergence rate as centralized gradient descent (SGD)
To the best of our knowledge, EDAS achieves the shortest time when the average of the $n$ cost functions is strongly convex.
arXiv Detail & Related papers (2021-05-11T08:09:31Z) - Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint [8.087699764574788]
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning context.
Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme.
arXiv Detail & Related papers (2021-01-06T17:06:42Z) - Sample Complexity Bounds for Two Timescale Value-based Reinforcement
Learning Algorithms [65.09383385484007]
Two timescale approximation (SA) has been widely used in value-based reinforcement learning algorithms.
We study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ algorithms.
arXiv Detail & Related papers (2020-11-10T11:36:30Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.