Constant Stepsize Q-learning: Distributional Convergence, Bias and
Extrapolation
- URL: http://arxiv.org/abs/2401.13884v1
- Date: Thu, 25 Jan 2024 02:01:53 GMT
- Title: Constant Stepsize Q-learning: Distributional Convergence, Bias and
Extrapolation
- Authors: Yixuan Zhang and Qiaomin Xie
- Abstract summary: We study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence.
By connecting the constant stepsize Q-learning to a time-homogeneous chain, we show the distributional convergence of the iterates in distance.
We also establish a Central Limit Theory for Q-learning iterates, demonstrating the normality of the averaged iterates.
Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient.
- Score: 27.17913040244775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Approximation (SA) is a widely used algorithmic approach in
various fields, including optimization and reinforcement learning (RL). Among
RL algorithms, Q-learning is particularly popular due to its empirical success.
In this paper, we study asynchronous Q-learning with constant stepsize, which
is commonly used in practice for its fast convergence. By connecting the
constant stepsize Q-learning to a time-homogeneous Markov chain, we show the
distributional convergence of the iterates in Wasserstein distance and
establish its exponential convergence rate. We also establish a Central Limit
Theory for Q-learning iterates, demonstrating the asymptotic normality of the
averaged iterates. Moreover, we provide an explicit expansion of the asymptotic
bias of the averaged iterate in stepsize. Specifically, the bias is
proportional to the stepsize up to higher-order terms and we provide an
explicit expression for the linear coefficient. This precise characterization
of the bias allows the application of Richardson-Romberg (RR) extrapolation
technique to construct a new estimate that is provably closer to the optimal Q
function. Numerical results corroborate our theoretical finding on the
improvement of the RR extrapolation method.
Related papers
- Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - Last Iterate Convergence of Incremental Methods and Applications in Continual Learning [10.811735264028348]
Motivated by applications in continual learning, we obtain convergence guarantees for the last iterate of both incremental gradient and incremental proximal methods.
We study incremental proximal methods as a model of continual learning with generalization and argue that large amount of regularization is crucial to preventing catastrophic forgetting.
arXiv Detail & Related papers (2024-03-11T16:24:26Z) - Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z) - Non-Parametric Learning of Stochastic Differential Equations with Non-asymptotic Fast Rates of Convergence [65.63201894457404]
We propose a novel non-parametric learning paradigm for the identification of drift and diffusion coefficients of non-linear differential equations.
The key idea essentially consists of fitting a RKHS-based approximation of the corresponding Fokker-Planck equation to such observations.
arXiv Detail & Related papers (2023-05-24T20:43:47Z) - q-Learning in Continuous Time [1.4213973379473654]
We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation.
We develop a q-learning" theory around the q-function that is independent of time discretization.
We devise different actor-critic algorithms for solving underlying problems.
arXiv Detail & Related papers (2022-07-02T02:20:41Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Regularized Q-learning [6.663174194579773]
This paper develops a new Q-learning algorithm that converges when linear function approximation is used.
We experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge.
arXiv Detail & Related papers (2022-02-11T01:29:50Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and
Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation.
In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.