Fugu-MT 論文翻訳(概要): Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

論文の概要: Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

arxiv url: http://arxiv.org/abs/2510.04407v1
Date: Mon, 06 Oct 2025 00:33:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.635541
Title: Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games
Title（参考訳）: スケール不変レグレストマッチングと最適収束によるオンライン学習:ゼロサムゲームにおけるブリッジ理論と実践
Authors: Brian Hu Zhang, Ioannis Anagnostides, Tuomas Sandholm,
Abstract要約: ゼロサムゲームにおける理論と実践の間、何十年にもわたってかなりのシャズムが一階法によって浸食されてきた。我々は、IREG-PRM$+$と呼ぶPRM$+$の新しいスケール不変かつパラメータフリーな変種を提案する。ベンチマークゲームでは, PRM$+$と同等でありながら, 最適収束保証を$T-1/2$, $T-1$とする。
参考スコア（独自算出の注目度）: 60.871651115241406
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods. Although a convergence rate of $T^{-1}$ has long been established since Nemirovski's mirror-prox algorithm and Nesterov's excessive gap technique in the early 2000s, the most effective paradigm in practice is *counterfactual regret minimization*, which is based on *regret matching* and its modern variants. In particular, the state of the art across most benchmarks is *predictive* regret matching$^+$ (PRM$^+$), in conjunction with non-uniform averaging. Yet, such algorithms can exhibit slower $\Omega(T^{-1/2})$ convergence even in self-play. In this paper, we close the gap between theory and practice. We propose a new scale-invariant and parameter-free variant of PRM$^+$, which we call IREG-PRM$^+$. We show that it achieves $T^{-1/2}$ best-iterate and $T^{-1}$ (i.e., optimal) average-iterate convergence guarantees, while also being on par with PRM$^+$ on benchmark games. From a technical standpoint, we draw an analogy between IREG-PRM$^+$ and optimistic gradient descent with *adaptive* learning rate. The basic flaw of PRM$^+$ is that the ($\ell_2$-)norm of the regret vector -- which can be thought of as the inverse of the learning rate -- can decrease. By contrast, we design IREG-PRM$^+$ so as to maintain the invariance that the norm of the regret vector is nondecreasing. This enables us to derive an RVU-type bound for IREG-PRM$^+$, the first such property that does not rely on introducing additional hyperparameters to enforce smoothness. Furthermore, we find that IREG-PRM$^+$ performs on par with an adaptive version of optimistic gradient descent that we introduce whose learning rate depends on the misprediction error, demystifying the effectiveness of the regret matching family *vis-a-vis* more standard optimization techniques.
Abstract（参考訳）: ゼロサムゲームにおける理論と実践の間、何十年にもわたってかなりのシャズムが一階法によって浸食されてきた。 T^{-1}$の収束率はネミロフスキーのミラープロキシアルゴリズムとネステロフの2000年代初頭の過度なギャップ技法以来長い間確立されてきたが、実際には *counterfactual regret minimization* であり、これは*regret matching* とその現代の変種に基づいている。特に、ほとんどのベンチマークにおける最先端は*予測的*後悔のマッチング$^+$(PRM$^+$)であり、非一様平均化と合わせている。しかし、そのようなアルゴリズムは自己再生においても遅い$\Omega(T^{-1/2})$収束を示すことができる。本稿では,理論と実践のギャップを埋める。 IREG-PRM$^+$ と呼ぶ新しいスケール不変かつパラメータフリーな PRM$^+$ を提案する。我々は、ベンチマークゲームにおいて、PRM$^+$と同等でありながら、$T^{-1/2}$ベストレートと$T^{-1}$(すなわち、最適)平均収束保証を達成することを示す。技術的観点から、IREG-PRM$^+$と*適応的*学習率による楽観的勾配勾配の類似性を描く。 PRM$^+$の基本的な欠点は、後悔ベクトル(学習率の逆と見なすことができる)の($\ell_2$-)ノルムが減少できることである。対照的に IREG-PRM$^+$ は、後悔ベクトルのノルムが非減少であるという不変性を維持するために設計する。これにより IREG-PRM$^+$ に対する RVU 型バウンダリを導出できる。さらに、IREG-PRM$^+$は、誤り予測誤差に依存する楽観的勾配勾配の適応バージョンと同等に動作し、より標準的な最適化手法として、後悔の一致したファミリー*vis-a-vis*の有効性をデミストする。

論文の概要: Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games

関連論文リスト