Scale-Invariant Fast Convergence in Games
- URL: http://arxiv.org/abs/2602.11857v1
- Date: Thu, 12 Feb 2026 11:57:20 GMT
- Title: Scale-Invariant Fast Convergence in Games
- Authors: Taira Tsuchiya, Haipeng Luo, Shinji Ito,
- Abstract summary: We develop learning dynamics that achieve fast convergence while being both scale-free and scale-invariant.<n>For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $tildeO(A_mathrmdiff)$.<n>For multiplayer general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.
- Score: 67.02769061793619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility scale. To address this, we develop learning dynamics that achieve fast convergence while being both scale-free, requiring no prior information about utilities, and scale-invariant, remaining unchanged under positive rescaling of utilities. For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $\tilde{O}(A_{\mathrm{diff}})$, where $A_{\mathrm{diff}}$ is the payoff range, which implies an $\tilde{O}(A_{\mathrm{diff}} / T)$ convergence rate to Nash equilibrium after $T$ rounds. For multiplayer general-sum games with $n$ players and $m$ actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by $O(U_{\mathrm{max}} \log T)$, where $U_{\mathrm{max}}$ is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an $O(U_{\mathrm{max}} \log T / T)$ convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.
Related papers
- Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games [60.871651115241406]
A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods.<n>We propose a new scale-invariance and parameter-free variant of PRM$+$, which we call IREG-PRM$+$.<n>We show that it achieves $T-1/2$ best-iterate and $T-1$ optimal convergence guarantees, while also being on par with PRM$+$ on benchmark games.
arXiv Detail & Related papers (2025-10-06T00:33:20Z) - From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications [54.49053278073321]
We show that for a large family of games, there exists a simple black-box reduction that transforms the average iterates of an uncoupled learning dynamics into the last iterates of a new uncoupled learning dynamics.<n>Our reduction applies to games where each player's utility is linear in both their own strategy and the joint strategy of all opponents.
arXiv Detail & Related papers (2025-06-04T00:24:14Z) - Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.<n>Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z) - Corrupted Learning Dynamics in Games [62.73758165845971]
An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL)<n>We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
arXiv Detail & Related papers (2024-12-10T02:23:44Z) - Local Convergence of Gradient Methods for Min-Max Games: Partial
Curvature Generically Suffices [0.5439020425819]
We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games.
We show that thanks to partial curvature, conic particle methods -- which optimize over both weights and supports of the mixed strategies -- generically converge faster than fixed-support methods.
arXiv Detail & Related papers (2023-05-26T21:43:56Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Gradient Descent-Ascent Provably Converges to Strict Local Minmax
Equilibria with a Finite Timescale Separation [11.091975655053547]
We show that a finite timescale separation parameter $tau$ has on gradient descent-ascent in non-player, non-concave zero-sum games.
We empirically demonstrate the significant impact timescale computing has on performance.
arXiv Detail & Related papers (2020-09-30T17:51:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.