Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
- URL: http://arxiv.org/abs/2511.02157v1
- Date: Tue, 04 Nov 2025 00:54:54 GMT
- Title: Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
- Authors: Asrin Efe Yorulmaz, Tamer Başar,
- Abstract summary: No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium.<n>We improve the convergence rate to CCE in general-sum games, reducing it from the previously best-known rate of $mathcalO(log5 T / T)$ to a sharper $mathcalO(log T / T)$.<n>This matches the best known convergence rate for CE in terms of $T$, number of iterations, while also improving the dependence on the action set size.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the convergence rate to CCE in general-sum Markov games, reducing it from the previously best-known rate of $\mathcal{O}(\log^5 T / T)$ to a sharper $\mathcal{O}(\log T / T)$. This matches the best known convergence rate for CE in terms of $T$, number of iterations, while also improving the dependence on the action set size from polynomial to polylogarithmic-yielding exponential gains in high-dimensional settings. Our approach builds on recent advances in adaptive step-size techniques for no-regret algorithms in normal-form games, and extends them to the Markovian setting via a stage-wise scheme that adjusts learning rates based on real-time feedback. We frame policy updates as an instance of Optimistic Follow-the-Regularized-Leader (OFTRL), customized for value-iteration-based learning. The resulting self-play algorithm achieves, to our knowledge, the fastest known convergence rate to CCE in Markov games.
Related papers
- Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games [53.447182734351]
We develop and analyze algorithms that provably achieve improved sample efficiency under Reverse Kullback-Leibler (KL) regularization.<n>We study both two-player zero-sum Matrix games and Markov games: for Matrix games, we propose OMG, an algorithm based on best response sampling with optimistic bonuses, and extend this idea to Markov games through the algorithm SOMG.<n>Both algorithms achieve a logarithmic regret in $T$ that scales inversely with the KL regularization strength $beta$ in addition to the standard $widetildemathcalO(sqrtT)
arXiv Detail & Related papers (2025-10-15T01:00:54Z) - Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games [44.95137108337898]
We provide an uncoupled policy optimization algorithm that attains a near-optimal $tildeO(T-1)$ convergence rate for computing a correlated equilibrium.
Our algorithm is constructed by combining two main elements (i.e. smooth value updates and (ii. the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer)
arXiv Detail & Related papers (2024-01-26T23:13:47Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - The Complexity of Markov Equilibrium in Stochastic Games [44.77547027158141]
We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum games is computationally intractable.
Our results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable.
arXiv Detail & Related papers (2022-04-08T10:51:01Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.