Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
- URL: http://arxiv.org/abs/2106.02745v1
- Date: Fri, 4 Jun 2021 22:30:25 GMT
- Title: Discovering Multi-Agent Auto-Curricula in Two-Player Zero-Sum Games
- Authors: Xidong Feng, Oliver Slumbers, Yaodong Yang, Ziyu Wan, Bo Liu, Stephen
McAleer, Ying Wen, Jun Wang
- Abstract summary: We introduce a framework, LMAC, that automates the discovery of the update rule without explicit human design.
Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance.
We show that LMAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO.
- Score: 31.97631243571394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When solving two-player zero-sum games, multi-agent reinforcement learning
(MARL) algorithms often create populations of agents where, at each iteration,
a new agent is discovered as the best response to a mixture over the opponent
population. Within such a process, the update rules of "who to compete with"
(i.e., the opponent mixture) and "how to beat them" (i.e., finding best
responses) are underpinned by manually developed game theoretical principles
such as fictitious play and Double Oracle. In this paper we introduce a
framework, LMAC, based on meta-gradient descent that automates the discovery of
the update rule without explicit human design. Specifically, we parameterise
the opponent selection module by neural networks and the best-response module
by optimisation subroutines, and update their parameters solely via interaction
with the game engine, where both players aim to minimise their exploitability.
Surprisingly, even without human design, the discovered MARL algorithms achieve
competitive or even better performance with the state-of-the-art
population-based game solvers (e.g., PSRO) on Games of Skill, differentiable
Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker.
Additionally, we show that LMAC is able to generalise from small games to large
games, for example training on Kuhn Poker and outperforming PSRO on Leduc
Poker. Our work inspires a promising future direction to discover general MARL
algorithms solely from data.
Related papers
- Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities [69.34646544774161]
We formulate a new variant of multi-player multi-armed bandit (MAB) model, which captures arrival of requests to each arm and the policy of allocating requests to players.
The challenge is how to design a distributed learning algorithm such that players select arms according to the optimal arm pulling profile.
We design an iterative distributed algorithm, which guarantees that players can arrive at a consensus on the optimal arm pulling profile in only M rounds.
arXiv Detail & Related papers (2024-08-20T13:57:00Z) - Autoverse: An Evolvable Game Language for Learning Robust Embodied Agents [2.624282086797512]
We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games.
We demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms.
arXiv Detail & Related papers (2024-07-05T02:18:02Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Public Information Representation for Adversarial Team Games [31.29335755664997]
adversarial team games reside in the asymmetric information available to the team members during the play.
Our algorithms convert a sequential team game with adversaries to a classical two-player zero-sum game.
Due to the NP-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one.
arXiv Detail & Related papers (2022-01-25T15:07:12Z) - Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games [0.0]
Two player zero sum simultaneous action games are common in video games, financial markets, war, business competition, and many other settings.
We introduce the fundamental concepts of reinforcement learning in two player zero sum simultaneous action games and discuss the unique challenges this type of game poses.
We introduce two novel agents that attempt to handle these challenges by using joint action Deep Q-Networks.
arXiv Detail & Related papers (2021-10-10T16:03:44Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.