The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
- URL: http://arxiv.org/abs/2401.07844v6
- Date: Wed, 05 Feb 2025 19:20:11 GMT
- Title: The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
- Authors: Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang,
- Abstract summary: One fundamental challenge in analyzing an approximation algorithm is to establish its stability.
We extend the celebrated Borkar-Meyn theorem for stability bounded from the Martingale difference noise setting to the Markovian noise setting.
- Score: 17.493808856903303
- License:
- Abstract: Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of the strong law of large numbers and a form of the law of the iterated logarithm.
Related papers
- Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates [61.091122503406304]
We show that the gradient bandit algorithm converges to a globally optimal policy almost surely using emphany constant learning rate.
This result demonstrates that gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down.
arXiv Detail & Related papers (2025-02-11T00:12:04Z) - A weak convergence approach to large deviations for stochastic approximations [0.9374652839580183]
We prove a large deviation principle for general approximations with state-dependent Markovian noise and decreasing step size.
Examples of learning algorithms that are covered include gradient descent, persistent contrastive divergence and the Wang-Landau algorithm.
arXiv Detail & Related papers (2025-02-04T17:50:30Z) - Byzantine-Robust Decentralized Stochastic Optimization with Stochastic
Gradient Noise-Independent Learning Error [25.15075119957447]
We study Byzantine-robust optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by gradient descent (SGD)
The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process.
arXiv Detail & Related papers (2023-08-10T02:14:23Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - A Closed Loop Gradient Descent Algorithm applied to Rosenbrock's
function [0.0]
We introduce a novel adaptive technique for an gradient system which finds application as a gradient descent algorithm for unconstrained inertial damping.
Also using Lyapunov stability analysis, we demonstrate the performance of the continuous numerical-time version of the algorithm.
arXiv Detail & Related papers (2021-08-29T17:25:24Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.