Consensus Multiplicative Weights Update: Learning to Learn using
Projector-based Game Signatures
- URL: http://arxiv.org/abs/2106.02615v1
- Date: Fri, 4 Jun 2021 17:26:54 GMT
- Title: Consensus Multiplicative Weights Update: Learning to Learn using
Projector-based Game Signatures
- Authors: Nelson Vadori, Rahul Savani, Thomas Spooner, Sumitra Ganesh
- Abstract summary: We introduce the second such algorithm, textitConsensus MWU, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU.
Our algorithm shows the importance of a new object, the textitsimplex Hessian, as well as of the interaction of the game with the (eigen)space of vectors summing to zero.
- Score: 8.08640000394814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Optimistic Multiplicative Weights Update (OMWU) was proven to be
the first constant step-size algorithm in the online no-regret framework to
enjoy last-iterate convergence to Nash Equilibria in the constrained zero-sum
bimatrix case, where weights represent the probabilities of playing pure
strategies. We introduce the second such algorithm, \textit{Consensus MWU}, for
which we prove local convergence and show empirically that it enjoys faster and
more robust convergence than OMWU. Our algorithm shows the importance of a new
object, the \textit{simplex Hessian}, as well as of the interaction of the game
with the (eigen)space of vectors summing to zero, which we believe future
research can build on. As for OMWU, CMWU has convergence guarantees in the
zero-sum case only, but Cheung and Piliouras (2020) recently showed that OMWU
and MWU display opposite convergence properties depending on whether the game
is zero-sum or cooperative. Inspired by this work and the recent literature on
learning to optimize for single functions, we extend CMWU to non zero-sum games
by introducing a new framework for online learning in games, where the update
rule's gradient and Hessian coefficients along a trajectory are learnt by a
reinforcement learning policy that is conditioned on the nature of the game:
\textit{the game signature}. We construct the latter using a new canonical
decomposition of two-player games into eight components corresponding to
commutative projection operators, generalizing and unifying recent game
concepts studied in the literature. We show empirically that our new learning
policy is able to exploit the game signature across a wide range of game types.
Related papers
- On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Online Multiobjective Minimax Optimization and Applications [14.699969822572308]
We introduce a simple but general online learning framework, in which an adaptive adversary introduces a new game.
The learner's goal is to play so as to minimize the maximum coordinate of the cumulative vector-valued loss.
We give a simple algorithm that can compete with the setting in which the adversary must announce their action first.
This includes no regret learning: we can recover optimal algorithms and bounds for minimizing external regret, internal regret, adaptive regret, multigroup regret, subsequence regret, and a notion of regret in the sleeping experts setting.
arXiv Detail & Related papers (2021-08-09T06:52:08Z) - Understanding Modern Techniques in Optimization: Frank-Wolfe, Nesterov's
Momentum, and Polyak's Momentum [8.515692980023948]
We develop a modular framework that can serve as a recipe for constructing and analyzing iterative algorithms for convex optimization.
We show that our approach leads to three new fast FrankWolf Nesterov algorithms for some constraint sets.
In the second part, we develop a modular analysis of Polyak momentum for certain problems.
arXiv Detail & Related papers (2021-06-23T17:53:39Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.