Leading the Pack: N-player Opponent Shaping
- URL: http://arxiv.org/abs/2312.12564v2
- Date: Tue, 26 Dec 2023 11:23:25 GMT
- Title: Leading the Pack: N-player Opponent Shaping
- Authors: Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu,
Edward Grefenstette, Tim Rockt\"aschel
- Abstract summary: We extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents.
We find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well.
- Score: 52.682734939786464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning solutions have great success in the 2-player general
sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which
agents account for the learning of their co-players, has led to agents which
are able to avoid collectively bad outcomes, whilst also maximizing their
reward. These methods have currently been limited to 2-player game. However,
the real world involves interactions with many more agents, with interactions
on both local and global scales. In this paper, we extend Opponent Shaping (OS)
methods to environments involving multiple co-players and multiple shaping
agents. We evaluate on over 4 different environments, varying the number of
players from 3 to 5, and demonstrate that model-based OS methods converge to
equilibrium with better global welfare than naive learning. However, we find
that when playing with a large number of co-players, OS methods' relative
performance reduces, suggesting that in the limit OS methods may not perform
well. Finally, we explore scenarios where more than one OS method is present,
noticing that within games requiring a majority of cooperating agents, OS
methods converge to outcomes with poor global welfare.
Related papers
- Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Scaling Opponent Shaping to High Dimensional Games [17.27358464280679]
We develop an OS-based approach to general-sum games with temporally-extended actions and long-time horizons.
We show that Shaper leads to improved individual and collective outcomes in a range of challenging settings from literature.
arXiv Detail & Related papers (2023-12-19T20:05:23Z) - Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z) - MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement
Learning [22.28076947612619]
We introduce Multi-Agent Environment Design Strategist for Open-Ended Learning (MAESTRO)
MAESTRO is the first multi-agent UED approach for two-player zero-sum settings.
Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games.
arXiv Detail & Related papers (2023-03-06T18:57:41Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - An Instance-Dependent Analysis for the Cooperative Multi-Player
Multi-Armed Bandit [93.97385339354318]
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits.
First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps.
Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players.
arXiv Detail & Related papers (2021-11-08T23:38:47Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Learning to Play No-Press Diplomacy with Best Response Policy Iteration [31.367850729299665]
We apply deep reinforcement learning methods to Diplomacy, a 7-player board game.
We show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
arXiv Detail & Related papers (2020-06-08T14:33:31Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.