Central Limit Theorem for Two-Timescale Stochastic Approximation with
Markovian Noise: Theory and Applications
- URL: http://arxiv.org/abs/2401.09339v2
- Date: Tue, 13 Feb 2024 21:35:42 GMT
- Title: Central Limit Theorem for Two-Timescale Stochastic Approximation with
Markovian Noise: Theory and Applications
- Authors: Jie Hu, Vishwaraj Doshi, Do Young Eun
- Abstract summary: We conduct an in-depth analysis of two-timescale approximation (TTSA) under controlled Markovian noise via central limit (CLT)
We uncover the dynamics of TTSA influenced by the underlying Markov chain, which has not been addressed by previous CLT results of TTSA only with Martingale difference noise.
In addition, we leverage our CLT result to deduce the statistical properties of GTD algorithms with nonlinear function approximation using Markovian samples and show their identical performance, a perspective not evident from current finite-time bounds.
- Score: 11.3631620309434
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Two-timescale stochastic approximation (TTSA) is among the most general
frameworks for iterative stochastic algorithms. This includes well-known
stochastic optimization methods such as SGD variants and those designed for
bilevel or minimax problems, as well as reinforcement learning like the family
of gradient-based temporal difference (GTD) algorithms. In this paper, we
conduct an in-depth asymptotic analysis of TTSA under controlled Markovian
noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA
influenced by the underlying Markov chain, which has not been addressed by
previous CLT results of TTSA only with Martingale difference noise. Building
upon our CLT, we expand its application horizon of efficient sampling
strategies from vanilla SGD to a wider TTSA context in distributed learning,
thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT
result to deduce the statistical properties of GTD algorithms with nonlinear
function approximation using Markovian samples and show their identical
asymptotic performance, a perspective not evident from current finite-time
bounds.
Related papers
- Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic
Approximation with Markovian Noise [9.82187447690297]
We characterize a tight convergence bound for the iterations of linear two-time-scale approximation (SA) with Markovian noise.
Our results can be applied to establish the convergence behavior of a variety of RL algorithms, such as TD-learning with Polyak averaging, GTD, and GTD2.
arXiv Detail & Related papers (2023-12-31T01:30:14Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Non-asymptotic estimates for TUSLA algorithm for non-convex learning
with applications to neural networks with ReLU activation function [3.5044892799305956]
We provide a non-asymptotic analysis for the tamed un-adjusted Langevin algorithm (TUSLA) introduced in Lovas et al.
In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wassersteinstein-1-2.
We show that the TUSLA algorithm converges rapidly to the optimal solution.
arXiv Detail & Related papers (2021-07-19T07:13:02Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Finite-Sample Analysis of Proximal Gradient TD Algorithms [43.035055641190105]
We analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.
Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP.
The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
arXiv Detail & Related papers (2020-06-06T20:16:25Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Convergence rates and approximation results for SGD and its
continuous-time counterpart [16.70533901524849]
This paper proposes a thorough theoretical analysis of convex Gradient Descent (SGD) with non-increasing step sizes.
First, we show that the SGD can be provably approximated by solutions of inhomogeneous Differential Equation (SDE) using coupling.
Recent analyses of deterministic and optimization methods by their continuous counterpart, we study the long-time behavior of the continuous processes at hand and non-asymptotic bounds.
arXiv Detail & Related papers (2020-04-08T18:31:34Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z) - Stochastic Approximate Gradient Descent via the Langevin Algorithm [11.36635610546803]
We introduce the approximate gradient descent (SAGD) as an alternative to the gradient descent for cases where unbiased gradients cannot be trivially obtained.
We show that SAGD performs well experimentally in popular statistical and machine learning problems such as the expectation-maximization algorithm and the variational autoencoders.
arXiv Detail & Related papers (2020-02-13T14:29:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.