Neural Proximal/Trust Region Policy Optimization Attains Globally
Optimal Policy
- URL: http://arxiv.org/abs/1906.10306v3
- Date: Mon, 27 Feb 2023 21:48:13 GMT
- Title: Neural Proximal/Trust Region Policy Optimization Attains Globally
Optimal Policy
- Authors: Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang
- Abstract summary: We show that a variant of PPOO equipped with over-parametrization converges to globally optimal networks.
The key to our analysis is the iterate of infinite gradient under a notion of one-dimensional monotonicity, where the gradient and are instant by networks.
- Score: 119.12515258771302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.
Related papers
- Linear Function Approximation as a Computationally Efficient Method to Solve Classical Reinforcement Learning Challenges [0.0]
We present an implementation of Natural Actor Critic algorithms with actor updates through Natural Policy Gradient methods.
We observe that our algorithm trains much faster than complex neural network architectures, and obtains an equivalent or greater result.
arXiv Detail & Related papers (2024-05-27T22:51:58Z) - RoPINN: Region Optimized Physics-Informed Neural Networks [66.38369833561039]
Physics-informed neural networks (PINNs) have been widely applied to solve partial differential equations (PDEs)
This paper proposes and theoretically studies a new training paradigm as region optimization.
A practical training algorithm, Region Optimized PINN (RoPINN), is seamlessly derived from this new paradigm.
arXiv Detail & Related papers (2024-05-23T09:45:57Z) - Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic
Algorithm [29.978816372127085]
We present a finite-time analysis of Natural actor-critic (NAC) with neural network approximation.
We identify the roles of neural networks, regularization and optimization techniques to achieve provably good performance.
arXiv Detail & Related papers (2022-06-02T02:13:29Z) - Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with
Linear Convergence Rates [7.094295642076582]
Mean-field regime is a theoretically attractive alternative to the NTK (lazy training) regime.
We establish a new linear convergence result for two-layer neural networks trained by continuous-time noisy descent in the mean-field regime.
arXiv Detail & Related papers (2022-05-19T21:05:40Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Training Sparse Neural Network by Constraining Synaptic Weight on Unit
Lp Sphere [2.429910016019183]
constraining the synaptic weights on unit Lp-sphere enables the flexibly control of the sparsity with p.
Our approach is validated by experiments on benchmark datasets covering a wide range of domains.
arXiv Detail & Related papers (2021-03-30T01:02:31Z) - Generalization bound of globally optimal non-convex neural network
training: Transportation map estimation by infinite dimensional Langevin
dynamics [50.83356836818667]
We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
arXiv Detail & Related papers (2020-07-11T18:19:50Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural
Networks: an Exact Characterization of the Optimal Solutions [51.60996023961886]
We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints.
Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces.
arXiv Detail & Related papers (2020-06-10T15:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.