Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning
- URL: http://arxiv.org/abs/2012.00805v1
- Date: Wed, 8 Apr 2020 03:59:21 GMT
- Title: Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning
- Authors: Prasenjit Karmakar
- Abstract summary: We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present for the first time an asymptotic convergence analysis of two
time-scale stochastic approximation driven by "controlled" Markov noise. In
particular, the faster and slower recursions have non-additive controlled
Markov noise components in addition to martingale difference noise. We analyze
the asymptotic behavior of our framework by relating it to limiting
differential inclusions in both time scales that are defined in terms of the
ergodic occupation measures associated with the controlled Markov processes.
Using a special case of our results, we present a solution to the off-policy
convergence problem for temporal-difference learning with linear function
approximation. We compile several aspects of the dynamics of stochastic
approximation algorithms with Markov iterate-dependent noise when the iterates
are not known to be stable beforehand. We achieve the same by extending the
lock-in probability (i.e. the probability of convergence to a specific
attractor of the limiting o.d.e. given that the iterates are in its domain of
attraction after a sufficiently large number of iterations (say) n_0) framework
to such recursions. We use these results to prove almost sure convergence of
the iterates to the specified attractor when the iterates satisfy an
"asymptotic tightness" condition. This, in turn, is shown to be useful in
analyzing the tracking ability of general "adaptive" algorithms. Finally, we
obtain the first informative error bounds on function approximation for the
policy evaluation algorithm proposed by Basu et al. when the aim is to find the
risk-sensitive cost represented using exponential utility. We show that this
happens due to the absence of difference term in the earlier bound which is
always present in all our bounds when the state space is large.
Related papers
- The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise [17.493808856903303]
One fundamental challenge in analyzing an approximation algorithm is to establish its stability.
In this paper, we extend the celebrated Borkar-Meyn theorem for stability bounded from the Martingale difference noise setting to the Markovian noise setting.
Central to our analysis is the diminishing rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lynov drift condition.
arXiv Detail & Related papers (2024-01-15T17:20:17Z) - Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic
Gradient Descent [30.84181129503133]
The last decade has witnessed an increasing number of stability for different algorithms applied on different loss functions.
We make a novel connection between theory applied and introduce a unified guideline for proving stability for optimization algorithms.
Our approach is flexible and can be readilyizable to other popular learning classes.
arXiv Detail & Related papers (2023-05-20T01:49:58Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Finite-Time Convergence Rates of Nonlinear Two-Time-Scale Stochastic
Approximation under Markovian Noise [2.0305676256390934]
We study the so-called two-time-scale approximation, a simulation-based approach for finding the roots of two coupled nonlinear operators.
In particular, we consider the scenario where the data in the method are generated by Markov processes, therefore, they are dependent.
Under some fairly standard assumptions on the operators and the Markov processes, we provide a formula that characterizes the convergence rate of the mean square errors generated by the method to zero.
arXiv Detail & Related papers (2021-04-04T15:19:19Z) - Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and
Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation.
In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.