Last-Iterate Convergence with Full and Noisy Feedback in Two-Player
Zero-Sum Games
- URL: http://arxiv.org/abs/2208.09855v3
- Date: Fri, 26 May 2023 04:50:50 GMT
- Title: Last-Iterate Convergence with Full and Noisy Feedback in Two-Player
Zero-Sum Games
- Authors: Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi
Iwasaki
- Abstract summary: We show that M2WU converges to an exact Nash equilibrium by iteratively adapting the mutation term.
We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates.
- Score: 8.037452358542465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for
learning an equilibrium in two-player zero-sum normal-form games and proves
that it exhibits the last-iterate convergence property in both full and noisy
feedback settings. In the former, players observe their exact gradient vectors
of the utility functions. In the latter, they only observe the noisy gradient
vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic
MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy
feedback. On the contrary, M2WU exhibits the last-iterate convergence to a
stationary point near a Nash equilibrium in both feedback settings. We then
prove that it converges to an exact Nash equilibrium by iteratively adapting
the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in
exploitability and convergence rates.
Related papers
- Games played by Exponential Weights Algorithms [0.0]
We consider a repeated interaction in discrete time, where each player uses an exponential weights algorithm characterized by an initial mixed action and a fixed learning rate.
We show that whenever a strict Nash equilibrium exists, the probability to play a strict Nash equilibrium at the next stage converges almost surely to 0 or 1.
arXiv Detail & Related papers (2024-07-09T08:49:51Z) - Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games.
We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix.
We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - Payoff-based learning with matrix multiplicative weights in quantum
games [35.111876815522116]
We study the problem of learning in quantum games - and other classes of semidefinite games - with scalar, payoff-based feedback.
We introduce a suite of minimal-information matrix multiplicative weights (3MW) methods tailored to different information frameworks.
We show that the 3MW method with deterministic payoff feedback retains the convergence rate of the vanilla, full information MMW algorithm in quantum min-max games.
arXiv Detail & Related papers (2023-11-04T14:56:17Z) - Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
arXiv Detail & Related papers (2023-02-20T16:05:04Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Consensus Multiplicative Weights Update: Learning to Learn using
Projector-based Game Signatures [8.08640000394814]
We introduce the second such algorithm, textitConsensus MWU, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU.
Our algorithm shows the importance of a new object, the textitsimplex Hessian, as well as of the interaction of the game with the (eigen)space of vectors summing to zero.
arXiv Detail & Related papers (2021-06-04T17:26:54Z) - Linear Last-iterate Convergence in Constrained Saddle-point Optimization [48.44657553192801]
We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU)
We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant.
We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
arXiv Detail & Related papers (2020-06-16T20:53:04Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z) - From Poincar\'e Recurrence to Convergence in Imperfect Information
Games: Finding Equilibrium via Regularization [49.368421783733815]
We show how adapting the reward can give strong convergence guarantees in monotone games.
We also show how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium.
arXiv Detail & Related papers (2020-02-19T21:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.