Related papers: Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning

URL: http://arxiv.org/abs/2012.00805v1
Date: Wed, 8 Apr 2020 03:59:21 GMT
Title: Stochastic Approximation with Markov Noise: Analysis and applications in reinforcement learning
Authors: Prasenjit Karmakar
Abstract summary: We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise. We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales. We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by "controlled" Markov noise. In particular, the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Using a special case of our results, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation. We compile several aspects of the dynamics of stochastic approximation algorithms with Markov iterate-dependent noise when the iterates are not known to be stable beforehand. We achieve the same by extending the lock-in probability (i.e. the probability of convergence to a specific attractor of the limiting o.d.e. given that the iterates are in its domain of attraction after a sufficiently large number of iterations (say) n_0) framework to such recursions. We use these results to prove almost sure convergence of the iterates to the specified attractor when the iterates satisfy an "asymptotic tightness" condition. This, in turn, is shown to be useful in analyzing the tracking ability of general "adaptive" algorithms. Finally, we obtain the first informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to find the risk-sensitive cost represented using exponential utility. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large.

Related papers

Statistical inference for Linear Stochastic Approximation with Markovian Noise [16.136756322711545]
We derive non-asymptotic Berry-Esseen bounds for averaged iterates of the Linear Approximation (LSA) algorithm driven by the Markovian noise.<n>Our work provides the first non-asymptotic guarantees on the rate of convergence of bootstrap-based confidence intervals for approximation with Markov noise.
arXiv Detail & Related papers (2025-05-25T11:43:28Z)
Asymptotic and Finite Sample Analysis of Nonexpansive Stochastic Approximations with Markovian Noise [20.474661995490365]
This work investigates approximations with merely nonexpansive operators. In particular, we study nonexpansive approximations with Markovian noise. As an application, we prove, for the first time, that the classical average reward temporal difference learning converges to a sample path dependent fixed point.
arXiv Detail & Related papers (2024-09-29T04:16:24Z)
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise [17.493808856903303]
One fundamental challenge in analyzing an approximation algorithm is to establish its stability. In this paper, we extend the celebrated Borkar-Meyn theorem for stability bounded from the Martingale difference noise setting to the Markovian noise setting. Central to our analysis is the diminishing rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lynov drift condition.
arXiv Detail & Related papers (2024-01-15T17:20:17Z)
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent [30.84181129503133]
The last decade has witnessed an increasing number of stability for different algorithms applied on different loss functions. We make a novel connection between theory applied and introduce a unified guideline for proving stability for optimization algorithms. Our approach is flexible and can be readilyizable to other popular learning classes.
arXiv Detail & Related papers (2023-05-20T01:49:58Z)
Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms. We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z)
Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm. We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic Approximation under Markovian Noise [2.0305676256390934]
We study the so-called two-time-scale approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators. In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent. Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero.
arXiv Detail & Related papers (2021-04-04T15:19:19Z)
Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation. In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.