Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
- URL: http://arxiv.org/abs/2510.04407v2
- Date: Mon, 13 Oct 2025 20:08:52 GMT
- Title: Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
- Authors: Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm,
- Abstract summary: A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods.<n>We propose a new scale-invariance and parameter-free variant of PRM$+$, which we call IREG-PRM$+$.<n>We show that it achieves $T-1/2$ best-iterate and $T-1$ optimal convergence guarantees, while also being on par with PRM$+$ on benchmark games.
- Score: 60.871651115241406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of $T^{-1}$ has long been established since Nemirovski's mirror-prox algorithm and Nesterov's excessive gap technique in the early 2000s, the most effective paradigm in practice is *counterfactual regret minimization*, which is based on *regret matching* and its modern variants. In particular, the state of the art across most benchmarks is *predictive* regret matching$^+$ (PRM$^+$), in conjunction with non-uniform averaging. Yet, such algorithms can exhibit slower $\Omega(T^{-1/2})$ convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM$^+$, which we call IREG-PRM$^+$. We show that it achieves $T^{-1/2}$ best-iterate and $T^{-1}$ (i.e., optimal) average-iterate convergence guarantees, while also being on par with PRM$^+$ on benchmark games. From a technical standpoint, we draw an analogy between IREG-PRM$^+$ and optimistic gradient descent with *adaptive* learning rate. The basic flaw of PRM$^+$ is that the ($\ell_2$-)norm of the regret vector -- which can be thought of as the inverse of the learning rate -- can decrease. By contrast, we design IREG-PRM$^+$ so as to maintain the invariance that the norm of the regret vector is nondecreasing. This enables us to derive an RVU-type bound for IREG-PRM$^+$, the first such property that does not rely on introducing additional hyperparameters to enforce smoothness. Furthermore, we find that IREG-PRM$^+$ performs on par with an adaptive version of optimistic gradient descent that we introduce whose learning rate depends on the misprediction error, demystifying the effectiveness of the regret matching family *vis-a-vis* more standard optimization techniques.
Related papers
- Scale-Invariant Fast Convergence in Games [67.02769061793619]
We develop learning dynamics that achieve fast convergence while being both scale-free and scale-invariant.<n>For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $tildeO(A_mathrmdiff)$.<n>For multiplayer general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.
arXiv Detail & Related papers (2026-02-12T11:57:20Z) - Convergence of Regret Matching in Potential Games and Constrained Optimization [85.55969013318627]
We show that alternating RM$+$ converges to an $epsilon$-KKT point after $O_epsilon (1/epsilon4)$, establishing for the first time that it is a sound and fast first-order.<n>Our lower bound shows that convergence to coarse correlated equilibria in potential games is exponentially faster than convergence to Nash equilibria.
arXiv Detail & Related papers (2025-10-20T00:45:47Z) - p-Mean Regret for Stochastic Bandits [52.828710025519996]
We introduce a simple, unified UCB-based algorithm that achieves novel $p$-mean regret bounds.<n>Our framework encompasses both average cumulative regret and Nash regret as special cases.
arXiv Detail & Related papers (2024-12-14T08:38:26Z) - Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits [12.674929126684528]
We address the problem of semi-bandits, where a player selects among P actions from the power set of a set containing d base items.
We show that our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches.
arXiv Detail & Related papers (2024-02-23T08:07:54Z) - Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games [71.73971094342349]
We study last-iterate convergence properties of algorithms for solving two-player zero-sum games based on Regret$+$ (RM$+$)<n>Our analysis builds on a new characterization of the geometric structure of the limit points of our algorithms, marking a significant departure from most of the literature on last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Mirror Natural Evolution Strategies [10.495496415022064]
We focus on the theory of zeroth-order optimization which utilizes both the first-order and second-order information approximated by the zeroth-order queries.
We show that the estimated covariance matrix of textttMiNES converges to the inverse of Hessian matrix of the objective function.
arXiv Detail & Related papers (2023-08-01T11:45:24Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z) - Tight last-iterate convergence rates for no-regret learning in
multi-player games [31.604950465209335]
We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O (1/sqrtT)$ in smooth monotone games.
We also show that the $O (1/sqrtT)$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
arXiv Detail & Related papers (2020-10-26T17:06:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.