Unifying Behavioral and Response Diversity for Open-ended Learning in
Zero-sum Games
- URL: http://arxiv.org/abs/2106.04958v2
- Date: Thu, 10 Jun 2021 16:00:18 GMT
- Title: Unifying Behavioral and Response Diversity for Open-ended Learning in
Zero-sum Games
- Authors: Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng
Chen, Changjie Fan, Zhipeng Hu
- Abstract summary: In open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies.
We propose a unified measure of diversity in multi-agent open-ended learning based on both Behavioral Diversity (BD) and Response Diversity (RD)
We show that many current diversity measures fall in one of the categories of BD or RD but not both.
With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
- Score: 44.30509625560908
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring and promoting policy diversity is critical for solving games with
strong non-transitive dynamics where strategic cycles exist, and there is no
consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a
pool of diverse policies via open-ended learning is an attractive solution,
which can generate auto-curricula to avoid being exploited. However, in
conventional open-ended learning algorithms, there are no widely accepted
definitions for diversity, making it hard to construct and evaluate the diverse
policies. In this work, we summarize previous concepts of diversity and work
towards offering a unified measure of diversity in multi-agent open-ended
learning to include all elements in Markov games, based on both Behavioral
Diversity (BD) and Response Diversity (RD). At the trajectory distribution
level, we re-define BD in the state-action space as the discrepancies of
occupancy measures. For the reward dynamics, we propose RD to characterize
diversity through the responses of policies when encountering different
opponents. We also show that many current diversity measures fall in one of the
categories of BD or RD but not both. With this unified diversity measure, we
design the corresponding diversity-promoting objective and population
effectivity when seeking the best responses in open-ended learning. We validate
our methods in both relatively simple games like matrix game, non-transitive
mixture model, and the complex \textit{Google Research Football} environment.
The population found by our methods reveals the lowest exploitability, highest
population effectivity in matrix game and non-transitive mixture model, as well
as the largest goal difference when interacting with opponents of various
levels in \textit{Google Research Football}.
Related papers
- Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning [8.905920197601173]
We introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric.
We show how DiCo can be employed as a novel paradigm to increase performance and sample efficiency in Multi-Agent Reinforcement Learning.
arXiv Detail & Related papers (2024-05-23T21:03:33Z) - Iteratively Learn Diverse Strategies with State Distance Information [18.509323383456707]
In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors.
We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
arXiv Detail & Related papers (2023-10-23T02:41:34Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Generating Personas for Games with Multimodal Adversarial Imitation
Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level.
Going beyond reinforcement learning is necessary to model a wide range of human playstyles.
This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Pick Your Battles: Interaction Graphs as Population-Level Objectives for
Strategic Diversity [49.68758494467258]
We study how to construct diverse populations of agents by carefully structuring how individuals within a population interact.
Our approach is based on interaction graphs, which control the flow of information between agents during training.
We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
arXiv Detail & Related papers (2021-10-08T11:29:52Z) - Modelling Behavioural Diversity for Learning in Open-Ended Games [15.978932309579013]
We offer a geometric interpretation of behavioural diversity in games.
We introduce a novel diversity metric based on emphdeterminantal point processes (DPP)
We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games.
arXiv Detail & Related papers (2021-03-14T13:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.