Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
in Reinforcement Learning
- URL: http://arxiv.org/abs/2108.08655v2
- Date: Mon, 18 Sep 2023 21:30:12 GMT
- Title: Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
in Reinforcement Learning
- Authors: Ziheng Wang and Justin Sirignano
- Abstract summary: Actor-critic algorithms are widely used in reinforcement learning, but are challenging to mathematically analyse due to the online arrival of non-i.i.d. data samples.
We prove that, under a time rescaling, the online actor-critic algorithm converges to an ordinary differential equation (ODE) as the number of updates becomes large.
Our convergence analysis holds under specific choices for the learning rates and exploration rates in the actor-critic algorithm, which could provide guidance for the implementation of actor-critic algorithms in practice.
- Score: 7.65995376636176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actor-critic algorithms are widely used in reinforcement learning, but are
challenging to mathematically analyse due to the online arrival of non-i.i.d.
data samples. The distribution of the data samples dynamically changes as the
model is updated, introducing a complex feedback loop between the data
distribution and the reinforcement learning algorithm. We prove that, under a
time rescaling, the online actor-critic algorithm with tabular parametrization
converges to an ordinary differential equation (ODE) as the number of updates
becomes large. The proof first establishes the geometric ergodicity of the data
samples under a fixed actor policy. Then, using a Poisson equation, we prove
that the fluctuations of the data samples around a dynamic probability measure,
which is a function of the evolving actor model, vanish as the number of
updates become large. Once the ODE limit has been derived, we study its
convergence properties using a two time-scale analysis which asymptotically
de-couples the critic ODE from the actor ODE. The convergence of the critic to
the solution of the Bellman equation and the actor to the optimal policy are
proven. In addition, a convergence rate to this global minimum is also
established. Our convergence analysis holds under specific choices for the
learning rates and exploration rates in the actor-critic algorithm, which could
provide guidance for the implementation of actor-critic algorithms in practice.
Related papers
- Actor-critic algorithms for fiber sampling problems [0.0]
We propose an actor-critic algorithm for a family of complex problems arising in statistics and discrete optimization.
We translate the problem into a Markov decision process and devise an actor-critic reinforcement learning (RL) algorithm to learn a set of good moves that can be used for sampling.
We prove that the actor-critic algorithm converges to an approximately optimal sampling policy.
arXiv Detail & Related papers (2024-05-22T19:33:58Z) - Weak Convergence Analysis of Online Neural Actor-Critic Algorithms [5.769172579648919]
In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated.
We prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions.
arXiv Detail & Related papers (2024-03-25T14:49:01Z) - Implementation and (Inverse Modified) Error Analysis for
implicitly-templated ODE-nets [0.0]
We focus on learning unknown dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers.
We perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation.
We formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations.
arXiv Detail & Related papers (2023-03-31T06:47:02Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - SODEN: A Scalable Continuous-Time Survival Model through Ordinary
Differential Equation Networks [14.564168076456822]
We propose a flexible model for survival analysis using neural networks along with scalable optimization algorithms.
We demonstrate the effectiveness of the proposed method in comparison to existing state-of-the-art deep learning survival analysis models.
arXiv Detail & Related papers (2020-08-19T19:11:25Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.