Two-Timescale Q-Learning with Function Approximation in Zero-Sum
Stochastic Games
- URL: http://arxiv.org/abs/2312.04905v1
- Date: Fri, 8 Dec 2023 08:39:36 GMT
- Title: Two-Timescale Q-Learning with Function Approximation in Zero-Sum
Stochastic Games
- Authors: Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, and Adam
Wierman
- Abstract summary: We propose a two-timescale smoothed $Q$-learning algorithm with function approximation that is payoff-based, convergent, rational, and symmetric between the two players.
In two-timescale $Q$-learning, the fast-timescales are updated in spirit to the gradient descent and the slow-timescales are updated by taking a convex combination between its previous and the latest fast-timescale.
The key novelty lies in constructing a valid Lyapunov function to capture the evolution of the slow-timescales.
- Score: 31.554420227087043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider two-player zero-sum stochastic games and propose a two-timescale
$Q$-learning algorithm with function approximation that is payoff-based,
convergent, rational, and symmetric between the two players. In two-timescale
$Q$-learning, the fast-timescale iterates are updated in spirit to the
stochastic gradient descent and the slow-timescale iterates (which we use to
compute the policies) are updated by taking a convex combination between its
previous iterate and the latest fast-timescale iterate. Introducing the slow
timescale as well as its update equation marks as our main algorithmic novelty.
In the special case of linear function approximation, we establish, to the best
of our knowledge, the first last-iterate finite-sample bound for payoff-based
independent learning dynamics of these types. The result implies a polynomial
sample complexity to find a Nash equilibrium in such stochastic games.
To establish the results, we model our proposed algorithm as a two-timescale
stochastic approximation and derive the finite-sample bound through a
Lyapunov-based approach. The key novelty lies in constructing a valid Lyapunov
function to capture the evolution of the slow-timescale iterates. Specifically,
through a change of variable, we show that the update equation of the
slow-timescale iterates resembles the classical smoothed best-response
dynamics, where the regularized Nash gap serves as a valid Lyapunov function.
This insight enables us to construct a valid Lyapunov function via a
generalized variant of the Moreau envelope of the regularized Nash gap. The
construction of our Lyapunov function might be of broad independent interest in
studying the behavior of stochastic approximation algorithms.
Related papers
- Calculating the expected value function of a two-stage stochastic
optimization program with a quantum algorithm [0.0]
Two-stage optimization involves calculating the expected cost of future decisions to inform the best decision in the present.
We provide a quantum algorithm to estimate the expected value function in a two-stage optimization problem in time largely independent from the complexity of the random variable.
arXiv Detail & Related papers (2024-02-23T00:07:34Z) - Fast Nonlinear Two-Time-Scale Stochastic Approximation: Achieving $O(1/k)$ Finite-Sample Complexity [2.5382095320488665]
This paper proposes to develop a new variant of the two-time-scale monotone approximation to find the roots of two coupled nonlinear operators.
Our key idea is to leverage the classic Ruppert-Polyak averaging technique to dynamically estimate the operators through their samples.
The estimated values of these averaging steps will then be used in the two-time-scale approximation updates to find the desired solution.
arXiv Detail & Related papers (2024-01-23T13:44:15Z) - Gradient-free optimization of highly smooth functions: improved analysis
and a new algorithm [87.22224691317766]
This work studies problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth.
We consider two kinds of zero-order projected gradient descent algorithms.
arXiv Detail & Related papers (2023-06-03T17:05:13Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - A Sublinear-Time Quantum Algorithm for Approximating Partition Functions [0.0]
We present a novel quantum algorithm for estimating Gibbs partition functions in sublinear time.
This is the first speed-up of this type to be obtained over the seminal nearly-linear time of vStefankovivc, Vempala and Vigoda.
arXiv Detail & Related papers (2022-07-18T14:41:48Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Better Regularization for Sequential Decision Spaces: Fast Convergence
Rates for Nash, Correlated, and Team Equilibria [121.36609493711292]
We study the application of iterative first-order methods to the problem of computing equilibria of large-scale two-player extensive-form games.
By instantiating first-order methods with our regularizers, we develop the first accelerated first-order methods for computing correlated equilibra and ex-ante coordinated team equilibria.
arXiv Detail & Related papers (2021-05-27T06:10:24Z) - Nonlinear Two-Time-Scale Stochastic Approximation: Convergence and
Finite-Time Performance [1.52292571922932]
We study the convergence and finite-time analysis of the nonlinear two-time-scale approximation.
In particular, we show that the method achieves a convergence in expectation at a rate $mathcalO (1/k2/3)$, where $k$ is the number of iterations.
arXiv Detail & Related papers (2020-11-03T17:43:39Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.