On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization
- URL: http://arxiv.org/abs/2306.10486v1
- Date: Sun, 18 Jun 2023 06:22:04 GMT
- Title: On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization
- Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal
- Abstract summary: We study a natural actor-critic algorithm that utilizes neural networks to represent the critic.
Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics.
- Score: 38.32265770020665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic algorithms have shown remarkable success in solving
state-of-the-art decision-making problems. However, despite their empirical
effectiveness, their theoretical underpinnings remain relatively unexplored,
especially with neural network parametrization. In this paper, we delve into
the study of a natural actor-critic algorithm that utilizes neural networks to
represent the critic. Our aim is to establish sample complexity guarantees for
this algorithm, achieving a deeper understanding of its performance
characteristics. To achieve that, we propose a Natural Actor-Critic algorithm
with 2-Layer critic parametrization (NAC2L). Our approach involves estimating
the $Q$-function in each iteration through a convex optimization problem. We
establish that our proposed approach attains a sample complexity of
$\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In
contrast, the existing sample complexity results in the literature only hold
for a tabular or linear MDP. Our result, on the other hand, holds for countable
state spaces and does not require a linear or low-rank structure on the MDP.
Related papers
- Deep Unrolling for Nonconvex Robust Principal Component Analysis [75.32013242448151]
We design algorithms for Robust Component Analysis (A)
It consists in decomposing a matrix into the sum of a low Principaled matrix and a sparse Principaled matrix.
arXiv Detail & Related papers (2023-07-12T03:48:26Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement
Learning with Provable Convergence [8.191815417711194]
Deep Actorritic algorithms combine Actorritic with deep neural network (DNN)
In this paper, we propose a single-loop Actor-Critic algorithm for general interaction.
We show that the SL-Critic algorithm converges with a superior learning approximation with superior performance.
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
Network Parametrization [33.12181620473604]
We study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm.
We show that this approach achieves a sample complexity of $tildemathcalO (1/epsilon2)$, which is order-optimal.
arXiv Detail & Related papers (2022-11-14T19:00:24Z) - Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic
Algorithm [29.978816372127085]
We present a finite-time analysis of Natural actor-critic (NAC) with neural network approximation.
We identify the roles of neural networks, regularization and optimization techniques to achieve provably good performance.
arXiv Detail & Related papers (2022-06-02T02:13:29Z) - Near-Optimal High Probability Complexity Bounds for Non-Smooth
Stochastic Optimization with Heavy-Tailed Noise [63.304196997102494]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity with bounds dependence on the confidence level that is either negative-power or logarithmic.
We propose novel stepsize rules for two gradient methods with clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm [4.932130498861987]
We provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.
We show that the algorithm converges to a global optimal policy with a sample complexity of $mathcalO(epsilon-3log2(1/epsilon)$ under an appropriate choice of stepsizes.
arXiv Detail & Related papers (2021-02-18T13:22:59Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.