Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games
- URL: http://arxiv.org/abs/2310.03354v1
- Date: Thu, 5 Oct 2023 07:19:33 GMT
- Title: Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games
- Authors: Zelai Xu, Yancheng Liang, Chao Yu, Yu Wang, Yi Wu
- Abstract summary: Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
- Score: 14.979239870856535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-play (SP) is a popular multi-agent reinforcement learning (MARL)
framework for solving competitive games, where each agent optimizes policy by
treating others as part of the environment. Despite the empirical successes,
the theoretical properties of SP-based methods are limited to two-player
zero-sum games. However, for mixed cooperative-competitive games where agents
on the same team need to cooperate with each other, we can show a simple
counter-example where SP-based methods cannot converge to a global Nash
equilibrium (NE) with high probability. Alternatively, Policy-Space Response
Oracles (PSRO) is an iterative framework for learning NE, where the best
responses w.r.t. previous policies are learned in each iteration. PSRO can be
directly extended to mixed cooperative-competitive settings by jointly learning
team best responses with all convergence properties unchanged. However, PSRO
requires repeatedly training joint policies from scratch till convergence,
which makes it hard to scale to complex games. In this work, we develop a novel
algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both
frameworks. FXP simultaneously trains an SP-based main policy and a counter
population of best response policies. The main policy is trained by fictitious
self-play and cross-play against the counter population, while the counter
policies are trained as the best responses to the main policy's past versions.
We validate our method in matrix games and show that FXP converges to global
NEs while SP methods fail. We also conduct experiments in a gridworld domain,
where FXP achieves higher Elo ratings and lower exploitabilities than
baselines, and a more challenging football game, where FXP defeats SOTA models
with over 94% win rate.
Related papers
- Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Leading the Pack: N-player Opponent Shaping [52.682734939786464]
We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
arXiv Detail & Related papers (2023-12-19T20:01:42Z) - Efficiently Computing Nash Equilibria in Adversarial Team Markov Games [19.717850955051837]
We introduce a class of games in which a team identically players is competing against an adversarial player.
This setting allows for a unifying treatment of zero-sum Markov games potential games.
Our main contribution is the first algorithm for computing stationary $epsilon$-approximate Nash equilibria in adversarial team Markov games.
arXiv Detail & Related papers (2022-08-03T16:41:01Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games [69.5064797859053]
We introduce emphSelf-Play PSRO (SP-PSRO), a method that adds an approximately optimal policy to the population in each iteration.
SP-PSRO empirically tends to converge much faster than APSRO and in many games converge in just a few iterations.
arXiv Detail & Related papers (2022-07-13T22:55:51Z) - Efficient Policy Space Response Oracles [61.71849698253696]
Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games.
Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games.
We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
arXiv Detail & Related papers (2022-01-28T17:54:45Z) - Discovering Diverse Multi-Agent Strategic Behavior via Reward
Randomization [42.33734089361143]
We propose a technique for discovering diverse strategic policies in complex multi-agent games.
We derive a new algorithm, Reward-Randomized Policy Gradient (RPG)
RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas.
arXiv Detail & Related papers (2021-03-08T06:26:55Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - Joint Policy Search for Multi-agent Collaboration with Imperfect
Information [31.559835225116473]
We show global changes of game values can be decomposed to policy changes localized at each information set.
We propose Joint Policy Search that iteratively improves joint policies of collaborative agents in imperfect information games.
arXiv Detail & Related papers (2020-08-14T17:58:47Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.