Discovering Diverse Multi-Agent Strategic Behavior via Reward
Randomization
- URL: http://arxiv.org/abs/2103.04564v1
- Date: Mon, 8 Mar 2021 06:26:55 GMT
- Title: Discovering Diverse Multi-Agent Strategic Behavior via Reward
Randomization
- Authors: Zhenggang Tang, Chao Yu, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei
Fang, Simon Du, Yu Wang, Yi Wu
- Abstract summary: We propose a technique for discovering diverse strategic policies in complex multi-agent games.
We derive a new algorithm, Reward-Randomized Policy Gradient (RPG)
RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas.
- Score: 42.33734089361143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a simple, general and effective technique, Reward Randomization
for discovering diverse strategic policies in complex multi-agent games.
Combining reward randomization and policy gradient, we derive a new algorithm,
Reward-Randomized Policy Gradient (RPG). RPG is able to discover multiple
distinctive human-interpretable strategies in challenging temporal trust
dilemmas, including grid-world games and a real-world game Agar.io, where
multiple equilibria exist but standard multi-agent policy gradient algorithms
always converge to a fixed one with a sub-optimal payoff for every player even
using state-of-the-art exploration techniques. Furthermore, with the set of
diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning
the best policy from the set; and (2) obtain an adaptive agent by using this
set of strategies as its training opponents. The source code and example videos
can be found in our website: https://sites.google.com/view/staghuntrpg.
Related papers
- Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Continuously Discovering Novel Strategies via Reward-Switching Policy
Optimization [9.456388509414046]
Reward-Switching Policy Optimization (RSPO)
RSPO is a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones.
Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges.
arXiv Detail & Related papers (2022-04-04T12:38:58Z) - Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning [12.170248966278281]
In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
arXiv Detail & Related papers (2021-08-30T04:30:53Z) - Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes)
We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play.
Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z) - Modelling Behavioural Diversity for Learning in Open-Ended Games [15.978932309579013]
We offer a geometric interpretation of behavioural diversity in games.
We introduce a novel diversity metric based on emphdeterminantal point processes (DPP)
We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games.
arXiv Detail & Related papers (2021-03-14T13:42:39Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.