Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
- URL: http://arxiv.org/abs/2102.09318v1
- Date: Thu, 18 Feb 2021 13:22:59 GMT
- Title: Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm
- Authors: Sajad Khodadadian, Zaiwei Chen, and Siva Theja Maguluri
- Abstract summary: We provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.
We show that the algorithm converges to a global optimal policy with a sample complexity of $mathcalO(epsilon-3log2(1/epsilon)$ under an appropriate choice of stepsizes.
- Score: 4.932130498861987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we provide finite-sample convergence guarantees for an
off-policy variant of the natural actor-critic (NAC) algorithm based on
Importance Sampling. In particular, we show that the algorithm converges to a
global optimal policy with a sample complexity of
$\mathcal{O}(\epsilon^{-3}\log^2(1/\epsilon))$ under an appropriate choice of
stepsizes. In order to overcome the issue of large variance due to Importance
Sampling, we propose the $Q$-trace algorithm for the critic, which is inspired
by the V-trace algorithm (Espeholt et al., 2018). This enables us to explicitly
control the bias and variance, and characterize the trade-off between them. As
an advantage of off-policy sampling, a major feature of our result is that we
do not need any additional assumptions, beyond the ergodicity of the Markov
chain induced by the behavior policy.
Related papers
- Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms [5.945710235932345]
We consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes.
We carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting.
We also show the results of experiments on three different Safety-Gym environments.
arXiv Detail & Related papers (2023-10-25T05:04:00Z) - Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm
with General Parameterization for Infinite Horizon Discounted Reward Markov
Decision Processes [41.61653528766776]
We propose the Natural Accelerated Policy Gradient (PGAN) algorithm that utilizes an accelerated gradient descent process to obtain the natural policy gradient.
An iteration achieves $mathcalO(epsilon-2)$ sample complexity and $mathcalO(epsilon-1)$ complexity.
In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $mathcalO(epsilon-frac12)$ and simultaneously matches their state-of
arXiv Detail & Related papers (2023-10-18T03:00:15Z) - On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization [38.32265770020665]
We study a natural actor-critic algorithm that utilizes neural networks to represent the critic.
Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics.
arXiv Detail & Related papers (2023-06-18T06:22:04Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Towards Sample-Optimal Compressive Phase Retrieval with Sparse and
Generative Priors [59.33977545294148]
We show that $O(k log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function.
We adapt this result to sparse phase retrieval, and show that $O(s log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional.
arXiv Detail & Related papers (2021-06-29T12:49:54Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear
Function Approximation [5.543220407902113]
We develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation.
We establish a sample complexity of $mathcalO(epsilon-3)$, outperforming all the previously known convergence bounds of such algorithms.
arXiv Detail & Related papers (2021-05-26T13:35:42Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.