Related papers: Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

URL: http://arxiv.org/abs/2312.04905v1
Date: Fri, 8 Dec 2023 08:39:36 GMT
Title: Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games
Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, and Adam Wierman
Abstract summary: We propose a two-timescale smoothed $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescales are updated in spirit to the gradient descent and the slow-timescales are updated by taking a convex combination between its previous and the latest fast-timescale. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescales.
Score: 31.554420227087043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider two-player zero-sum stochastic games and propose a two-timescale $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players. In two-timescale $Q$-learning, the fast-timescale iterates are updated in spirit to the stochastic gradient descent and the slow-timescale iterates (which we use to compute the policies) are updated by taking a convex combination between its previous iterate and the latest fast-timescale iterate. Introducing the slow timescale as well as its update equation marks as our main algorithmic novelty. In the special case of linear function approximation, we establish, to the best of our knowledge, the first last-iterate finite-sample bound for payoff-based independent learning dynamics of these types. The result implies a polynomial sample complexity to find a Nash equilibrium in such stochastic games. To establish the results, we model our proposed algorithm as a two-timescale stochastic approximation and derive the finite-sample bound through a Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescale iterates. Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function. This insight enables us to construct a valid Lyapunov function via a generalized variant of the Moreau envelope of the regularized Nash gap. The construction of our Lyapunov function might be of broad independent interest in studying the behavior of stochastic approximation algorithms.

Related papers

$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation [0.0]
We obtain an improved gradient bound of $O (1/k)$ for nonlinear two-time-scale approximations. Our result applies to algorithms such as descent-ascent and two-time-scale Lagrangian optimization.
arXiv Detail & Related papers (2025-04-27T22:45:00Z)
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis [0.0]
We study two-time-scale iterations, where the slower time-scale has a non-expansive mapping. We show that the mean square error decays at a rate $O (1/k1/4-epsilon)$, where $epsilon>0$ is arbitrarily small.
arXiv Detail & Related papers (2025-01-18T16:00:14Z)
Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity [2.5382095320488665]
This paper proposes to develop a new variant of the two-time-scale monotone approximation to find the roots of two coupled nonlinear operators. Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples. The estimated values of these averaging steps will then be used in the two-time-scale approximation updates to find the desired solution.
arXiv Detail & Related papers (2024-01-23T13:44:15Z)
Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth. We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z)
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z)
A Sublinear-Time Quantum Algorithm for Approximating Partition Functions [0.0]
We present a novel quantum algorithm for estimating Gibbs partition functions in sublinear time. This is the first speed-up of this type to be obtained over the seminal nearly-linear time of vStefankovivc, Vempala and Vigoda.
arXiv Detail & Related papers (2022-07-18T14:41:48Z)
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method. We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z)
Convergence Rates of Two-Time-Scale Gradient Descent-Ascent Dynamics for Solving Nonconvex Min-Max Problems [2.0305676256390934]
We characterize the finite-time performance of the continuous-time variant of simultaneous gradient descent-ascent algorithm. Our results on the behavior of continuous-time algorithm may be used to enhance the convergence properties of its discrete-time counterpart.
arXiv Detail & Related papers (2021-12-17T15:51:04Z)
Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria [121.36609493711292]
We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games. By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria.
arXiv Detail & Related papers (2021-05-27T06:10:24Z)
Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation. In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.