Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
- URL: http://arxiv.org/abs/2504.18743v1
- Date: Fri, 25 Apr 2025 23:41:14 GMT
- Title: Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
- Authors: Zaiwei Chen,
- Abstract summary: This work presents the first finite-time analysis for the last-iterate convergence of average-reward Q-learning with an asynchronous implementation.<n>A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair.
- Score: 4.169915659794567
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents the first finite-time analysis for the last-iterate convergence of average-reward Q-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair. We show that the iterates generated by this Q-learning algorithm converge at a rate of $O(1/k)$ (in the mean-square sense) to the optimal relative Q-function in the span seminorm. Moreover, by adding a centering step to the algorithm, we further establish pointwise mean-square convergence to a centered optimal relative Q-function, also at a rate of $O(1/k)$. To prove these results, we show that adaptive stepsizes are necessary, as without them, the algorithm fails to converge to the correct target. In addition, adaptive stepsizes can be interpreted as a form of implicit importance sampling that counteracts the effects of asynchronous updates. Technically, the use of adaptive stepsizes makes each Q-learning update depend on the entire sample history, introducing strong correlations and making the algorithm a non-Markovian stochastic approximation (SA) scheme. Our approach to overcoming this challenge involves (1) a time-inhomogeneous Markovian reformulation of non-Markovian SA, and (2) a combination of almost-sure time-varying bounds, conditioning arguments, and Markov chain concentration inequalities to break the strong correlations between the adaptive stepsizes and the iterates. The tools developed in this work are likely to be broadly applicable to the analysis of general SA algorithms with adaptive stepsizes.
Related papers
- Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis [32.8453673919231]
We study optimization with data-adaptive sampling schemes to train pairwise learning models.<n>A notable difference of pairwise learning from pointwise learning is the statistical ascent among input pairs.
arXiv Detail & Related papers (2025-04-03T18:24:01Z) - Towards Hyper-parameter-free Federated Learning [1.3682156035049038]
We introduce algorithms for automated scaling of global model updates.
In first algorithm, we establish that a descent-ensuring step-size regime at the clients ensures descent for the server objective.
Second algorithm shows that the average of objective values of sampled clients is a practical and effective substitute for the value server required for computing the scaling factor.
arXiv Detail & Related papers (2024-08-30T09:35:36Z) - Two-Step Q-Learning [0.0]
The paper proposes a novel off-policy two-step Q-learning algorithm, without importance sampling.
Numerical experiments demonstrate the superior performance of both the two-step Q-learning and its smooth variants.
arXiv Detail & Related papers (2024-07-02T15:39:00Z) - Regularized Q-Learning with Linear Function Approximation [2.765106384328772]
We consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation.<n>We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise.
arXiv Detail & Related papers (2024-01-26T20:45:40Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially.
We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem.
Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.