Related papers: Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

URL: http://arxiv.org/abs/2403.16825v1
Date: Mon, 25 Mar 2024 14:49:01 GMT
Title: Weak Convergence Analysis of Online Neural Actor-Critic Algorithms
Authors: Samuel Chun-Hei Lam, Justin Sirignano, Ziheng Wang,
Abstract summary: In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated. We prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions.
Score: 5.769172579648919
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for any convergence analysis. We establish the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the model updates around the limit distribution due to the randomly-arriving data samples vanish as the number of parameter updates $\rightarrow \infty$. Using the Poisson equation and weak convergence techniques, we prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions. Analysis of the limit ODE shows that the limit critic network will converge to the true value function, which will provide the actor an asymptotically unbiased estimate of the policy gradient. We then prove that the limit actor network will converge to a stationary point.

Related papers

An Optimal Transport Approach for Network Regression [0.6238182916866519]
We build upon recent developments in generalized regression models on metric spaces based on Fr'echet means. We propose a network regression method using the Wasserstein metric.
arXiv Detail & Related papers (2024-06-18T02:03:07Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences [0.0]
We characterize the tangents of recurrent neural networks (RNNs) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. These methods give rise to the neural kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
arXiv Detail & Related papers (2023-08-28T13:17:39Z)
Convergence of Adam Under Relaxed Assumptions [72.24779199744954]
We show that Adam converges to $epsilon$-stationary points with $O(epsilon-4)$ gradient complexity under far more realistic conditions. We also propose a variance-reduced version of Adam with an accelerated gradient complexity of $O(epsilon-3)$.
arXiv Detail & Related papers (2023-04-27T06:27:37Z)
On the Dynamics of Inference and Learning [0.0]
We present a treatment of this Bayesian updating process as a continuous dynamical system. We show that when the Cram'er-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law.
arXiv Detail & Related papers (2022-04-19T18:04:36Z)
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees [23.256961881716595]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
An application of the splitting-up method for the computation of a neural network representation for the solution for the filtering equations [68.8204255655161]
Filtering equations play a central role in many real-life applications, including numerical weather prediction, finance and engineering. One of the classical approaches to approximate the solution of the filtering equations is to use a PDE inspired method, called the splitting-up method. We combine this method with a neural network representation to produce an approximation of the unnormalised conditional distribution of the signal process.
arXiv Detail & Related papers (2022-01-10T11:01:36Z)
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years. We take a mean-field perspective on the evolution and convergence of feature-based neural AC. We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning [7.65995376636176]
Actor-critic algorithms are widely used in reinforcement learning, but are challenging to mathematically analyse due to the online arrival of non-i.i.d. data samples. We prove that, under a time rescaling, the online actor-critic algorithm converges to an ordinary differential equation (ODE) as the number of updates becomes large. Our convergence analysis holds under specific choices for the learning rates and exploration rates in the actor-critic algorithm, which could provide guidance for the implementation of actor-critic algorithms in practice.
arXiv Detail & Related papers (2021-08-19T12:37:58Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.