On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization
- URL: http://arxiv.org/abs/2306.10486v1
- Date: Sun, 18 Jun 2023 06:22:04 GMT
- Title: On the Global Convergence of Natural Actor-Critic with Two-layer Neural
Network Parametrization
- Authors: Mudit Gaur, Amrit Singh Bedi, Di Wang, Vaneet Aggarwal
- Abstract summary: We study a natural actor-critic algorithm that utilizes neural networks to represent the critic.
Our aim is to establish sample complexity guarantees for this algorithm, achieving a deeper understanding of its performance characteristics.
- Score: 38.32265770020665
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Actor-critic algorithms have shown remarkable success in solving
state-of-the-art decision-making problems. However, despite their empirical
effectiveness, their theoretical underpinnings remain relatively unexplored,
especially with neural network parametrization. In this paper, we delve into
the study of a natural actor-critic algorithm that utilizes neural networks to
represent the critic. Our aim is to establish sample complexity guarantees for
this algorithm, achieving a deeper understanding of its performance
characteristics. To achieve that, we propose a Natural Actor-Critic algorithm
with 2-Layer critic parametrization (NAC2L). Our approach involves estimating
the $Q$-function in each iteration through a convex optimization problem. We
establish that our proposed approach attains a sample complexity of
$\tilde{\mathcal{O}}\left(\frac{1}{\epsilon^{4}(1-\gamma)^{4}}\right)$. In
contrast, the existing sample complexity results in the literature only hold
for a tabular or linear MDP. Our result, on the other hand, holds for countable
state spaces and does not require a linear or low-rank structure on the MDP.
Related papers
- Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$.
Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - On the Global Convergence of Fitted Q-Iteration with Two-layer Neural
Network Parametrization [33.12181620473604]
We study a Fitted Q-Iteration with two-layer ReLU neural network parametrization, and find the sample complexity guarantees for the algorithm.
We show that this approach achieves a sample complexity of $tildemathcalO (1/epsilon2)$, which is order-optimal.
arXiv Detail & Related papers (2022-11-14T19:00:24Z) - Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic
Algorithm [29.978816372127085]
We present a finite-time analysis of Natural actor-critic (NAC) with neural network approximation.
We identify the roles of neural networks, regularization and optimization techniques to achieve provably good performance.
arXiv Detail & Related papers (2022-06-02T02:13:29Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm [4.932130498861987]
We provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.
We show that the algorithm converges to a global optimal policy with a sample complexity of $mathcalO(epsilon-3log2(1/epsilon)$ under an appropriate choice of stepsizes.
arXiv Detail & Related papers (2021-02-18T13:22:59Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Deep neural networks for inverse problems with pseudodifferential
operators: an application to limited-angle tomography [0.4110409960377149]
We propose a novel convolutional neural network (CNN) designed for learning pseudodifferential operators ($Psi$DOs) in the context of linear inverse problems.
We show that, under rather general assumptions on the forward operator, the unfolded iterations of ISTA can be interpreted as the successive layers of a CNN.
In particular, we prove that, in the case of LA-CT, the operations of upscaling, downscaling and convolution, can be exactly determined by combining the convolutional nature of the limited angle X-ray transform and basic properties defining a wavelet system.
arXiv Detail & Related papers (2020-06-02T14:03:41Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.