Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
in Reinforcement Learning
- URL: http://arxiv.org/abs/2108.08655v2
- Date: Mon, 18 Sep 2023 21:30:12 GMT
- Title: Global Convergence of the ODE Limit for Online Actor-Critic Algorithms
in Reinforcement Learning
- Authors: Ziheng Wang and Justin Sirignano
- Abstract summary: Actor-critic algorithms are widely used in reinforcement learning, but are challenging to mathematically analyse due to the online arrival of non-i.i.d. data samples.
We prove that, under a time rescaling, the online actor-critic algorithm converges to an ordinary differential equation (ODE) as the number of updates becomes large.
Our convergence analysis holds under specific choices for the learning rates and exploration rates in the actor-critic algorithm, which could provide guidance for the implementation of actor-critic algorithms in practice.
- Score: 7.65995376636176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Actor-critic algorithms are widely used in reinforcement learning, but are
challenging to mathematically analyse due to the online arrival of non-i.i.d.
data samples. The distribution of the data samples dynamically changes as the
model is updated, introducing a complex feedback loop between the data
distribution and the reinforcement learning algorithm. We prove that, under a
time rescaling, the online actor-critic algorithm with tabular parametrization
converges to an ordinary differential equation (ODE) as the number of updates
becomes large. The proof first establishes the geometric ergodicity of the data
samples under a fixed actor policy. Then, using a Poisson equation, we prove
that the fluctuations of the data samples around a dynamic probability measure,
which is a function of the evolving actor model, vanish as the number of
updates become large. Once the ODE limit has been derived, we study its
convergence properties using a two time-scale analysis which asymptotically
de-couples the critic ODE from the actor ODE. The convergence of the critic to
the solution of the Bellman equation and the actor to the optimal policy are
proven. In addition, a convergence rate to this global minimum is also
established. Our convergence analysis holds under specific choices for the
learning rates and exploration rates in the actor-critic algorithm, which could
provide guidance for the implementation of actor-critic algorithms in practice.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Weak Convergence Analysis of Online Neural Actor-Critic Algorithms [5.769172579648919]
In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated.
We prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions.
arXiv Detail & Related papers (2024-03-25T14:49:01Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Implementation and (Inverse Modified) Error Analysis for
implicitly-templated ODE-nets [0.0]
We focus on learning unknown dynamics from data using ODE-nets templated on implicit numerical initial value problem solvers.
We perform Inverse Modified error analysis of the ODE-nets using unrolled implicit schemes for ease of interpretation.
We formulate an adaptive algorithm which monitors the level of error and adapts the number of (unrolled) implicit solution iterations.
arXiv Detail & Related papers (2023-03-31T06:47:02Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Learning Time-Varying Graphs from Online Data [39.21234914444073]
This work proposes an algorithmic framework to learn time-varying graphs from online data.
It renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation.
We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM)
arXiv Detail & Related papers (2021-10-21T09:46:44Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z) - SODEN: A Scalable Continuous-Time Survival Model through Ordinary
Differential Equation Networks [14.564168076456822]
We propose a flexible model for survival analysis using neural networks along with scalable optimization algorithms.
We demonstrate the effectiveness of the proposed method in comparison to existing state-of-the-art deep learning survival analysis models.
arXiv Detail & Related papers (2020-08-19T19:11:25Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.