Guarantees for Self-Play in Multiplayer Games via Polymatrix
Decomposability
- URL: http://arxiv.org/abs/2310.11518v3
- Date: Wed, 29 Nov 2023 17:39:17 GMT
- Title: Guarantees for Self-Play in Multiplayer Games via Polymatrix
Decomposability
- Authors: Revan MacQueen, James R. Wright
- Abstract summary: Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself.
We show that in two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent.
For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms.
- Score: 2.2636685010313364
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-play is a technique for machine learning in multi-agent systems where a
learning algorithm learns by interacting with copies of itself. Self-play is
useful for generating large quantities of data for learning, but has the
drawback that the agents the learner will face post-training may have
dramatically different behavior than the learner came to expect by interacting
with itself. For the special case of two-player constant-sum games, self-play
that reaches Nash equilibrium is guaranteed to produce strategies that perform
well against any post-training opponent; however, no such guarantee exists for
multiplayer games. We show that in games that approximately decompose into a
set of two-player constant-sum games (called constant-sum polymatrix games)
where global $\epsilon$-Nash equilibria are boundedly far from Nash equilibria
in each subgame (called subgame stability), any no-external-regret algorithm
that learns by self-play will produce a strategy with bounded vulnerability.
For the first time, our results identify a structural property of multiplayer
games that enable performance guarantees for the strategies produced by a broad
class of self-play algorithms. We demonstrate our findings through experiments
on Leduc poker.
Related papers
- Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games [22.380293155135096]
We study best-response type learning dynamics for two player zero-sum matrix games.
We consider two settings that are distinguished by the type of information that each player has about the game and their opponent's strategy.
arXiv Detail & Related papers (2024-07-29T15:56:49Z) - Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games [21.168085154982712]
equilibria in multiplayer games are neither unique nor non-exploitable.
This paper takes an initial step towards addressing these challenges by focusing on the natural objective of equal share.
We design a series of efficient algorithms, inspired by no-regret learning, that provably attain approximate equal share across various settings.
arXiv Detail & Related papers (2024-06-06T15:59:17Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Collusion Detection in Team-Based Multiplayer Games [57.153233321515984]
We propose a system that detects colluding behaviors in team-based multiplayer games.
The proposed method analyzes the players' social relationships paired with their in-game behavioral patterns.
We then automate the detection using Isolation Forest, an unsupervised learning technique specialized in highlighting outliers.
arXiv Detail & Related papers (2022-03-10T02:37:39Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games [31.97631243571394]
We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
arXiv Detail & Related papers (2021-06-04T22:30:25Z) - Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games.
Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules.
We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.