Related papers: Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

URL: http://arxiv.org/abs/2401.13884v1
Date: Thu, 25 Jan 2024 02:01:53 GMT
Title: Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation
Authors: Yixuan Zhang and Qiaomin Xie
Abstract summary: We study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous chain, we show the distributional convergence of the iterates in distance. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the normality of the averaged iterates. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient.
Score: 27.17913040244775
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias allows the application of Richardson-Romberg (RR) extrapolation technique to construct a new estimate that is provably closer to the optimal Q function. Numerical results corroborate our theoretical finding on the improvement of the RR extrapolation method.

Related papers

A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging [2.378735224874938]
Polyak-Rt averaging is a widely used technique to achieve the optimal variance of approximation algorithms.<n>We present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates.
arXiv Detail & Related papers (2025-05-27T21:58:35Z)
A weak convergence approach to large deviations for stochastic approximations [0.9374652839580183]
We prove a large deviation principle for general approximations with state-dependent Markovian noise and decreasing step size. Examples of learning algorithms that are covered include gradient descent, persistent contrastive divergence and the Wang-Landau algorithm.
arXiv Detail & Related papers (2025-02-04T17:50:30Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Last Iterate Convergence of Incremental Methods and Applications in Continual Learning [10.811735264028348]
Motivated by applications in continual learning, we obtain convergence guarantees for the last iterate of both incremental gradient and incremental proximal methods. We study incremental proximal methods as a model of continual learning with generalization and argue that large amount of regularization is crucial to preventing catastrophic forgetting.
arXiv Detail & Related papers (2024-03-11T16:24:26Z)
Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems. Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms. By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z)
Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations. The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z)
q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation. We develop a q-learning" theory around the q-function that is independent of time discretization. We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z)
Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region. Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z)
Regularized Q-learning [6.663174194579773]
This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge.
arXiv Detail & Related papers (2022-02-11T01:29:50Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation. In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.