Optimal No-Regret Learning in General Games: Bounded Regret with
Unbounded Step-Sizes via Clairvoyant MWU
- URL: http://arxiv.org/abs/2111.14737v1
- Date: Mon, 29 Nov 2021 17:42:24 GMT
- Title: Optimal No-Regret Learning in General Games: Bounded Regret with
Unbounded Step-Sizes via Clairvoyant MWU
- Authors: Georgios Piliouras, Ryann Sim, Stratis Skoulakis
- Abstract summary: We provide an algorithm that achieves constant regret with fixed step-sizes.
The cumulative regret of our algorithm provably decreases linearly as the step-size increases.
- Score: 27.438327151960657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we solve the problem of no-regret learning in general games.
Specifically, we provide a simple and practical algorithm that achieves
constant regret with fixed step-sizes. The cumulative regret of our algorithm
provably decreases linearly as the step-size increases. Our findings depart
from the prevailing paradigm that vanishing step-sizes are a prerequisite for
low regret as championed by all state-of-the-art methods to date.
We shift away from this paradigm by defining a novel algorithm that we call
Clairvoyant Multiplicative Weights Updates (CMWU). CMWU is Multiplicative
Weights Updates (MWU) equipped with a mental model (jointly shared across all
agents) about the state of the system in its next period. Each agent records
its mixed strategy, i.e., its belief about what it expects to play in the next
period, in this shared mental model which is internally updated using MWU
without any changes to the real-world behavior up until it equilibrates, thus
marking its consistency with the next day's real-world outcome. It is then and
only then that agents take action in the real-world, effectively doing so with
the ``full knowledge" of the state of the system on the next day, i.e., they
are clairvoyant. CMWU effectively acts as MWU with one day look-ahead,
achieving bounded regret. At a technical level, we establish that
self-consistent mental models exist for any choice of step-sizes and provide
bounds on the step-size under which their uniqueness and linear-time
computation are guaranteed via contraction mapping arguments. Our arguments
extend well beyond normal-form games with little effort.
Related papers
- Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping [14.435637320909663]
MoE technique plays crucial role in expanding the size of DNN model parameters.
Existing methods attempt to mitigate this issue by overlapping all-to-all with expert computation.
In our study, we extend the scope of this challenge by considering overlap at the broader training graph level.
We implement these techniques in Lancet, a system using compiler-based optimization to automatically enhance MoE model training.
arXiv Detail & Related papers (2024-04-30T10:17:21Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Consensus Multiplicative Weights Update: Learning to Learn using
Projector-based Game Signatures [8.08640000394814]
We introduce the second such algorithm, textitConsensus MWU, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU.
Our algorithm shows the importance of a new object, the textitsimplex Hessian, as well as of the interaction of the game with the (eigen)space of vectors summing to zero.
arXiv Detail & Related papers (2021-06-04T17:26:54Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - AdamP: Slowing Down the Slowdown for Momentum Optimizers on
Scale-invariant Weights [53.8489656709356]
Normalization techniques are a boon for modern deep learning.
It is often overlooked, however, that the additional introduction of momentum results in a rapid reduction in effective step sizes for scale-invariant weights.
In this paper, we verify that the widely-adopted combination of the two ingredients lead to the premature decay of effective step sizes and sub-optimal model performances.
arXiv Detail & Related papers (2020-06-15T08:35:15Z) - Chaos, Extremism and Optimism: Volume Analysis of Learning in Games [55.24050445142637]
We present volume analyses of Multiplicative Weights Updates (MWU) and Optimistic Multiplicative Weights Updates (OMWU) in zero-sum as well as coordination games.
We show that OMWU contracts volume, providing an alternative understanding for its known convergent behavior.
We also prove a no-free-lunch type of theorem, in the sense that when examining coordination games the roles are reversed: OMWU expands volume exponentially fast, whereas MWU contracts.
arXiv Detail & Related papers (2020-05-28T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.