Related papers: Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

URL: http://arxiv.org/abs/2106.04958v2
Date: Thu, 10 Jun 2021 16:00:18 GMT
Title: Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games
Authors: Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu
Abstract summary: In open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. We propose a unified measure of diversity in multi-agent open-ended learning based on both Behavioral Diversity (BD) and Response Diversity (RD) We show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
Score: 44.30509625560908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex \textit{Google Research Football} environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in \textit{Google Research Football}.

Related papers

Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning [8.905920197601173]
We introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric. We show how DiCo can be employed as a novel paradigm to increase performance and sample efficiency in Multi-Agent Reinforcement Learning.
arXiv Detail & Related papers (2024-05-23T21:03:33Z)
Iteratively Learn Diverse Strategies with State Distance Information [18.509323383456707]
In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors. We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
arXiv Detail & Related papers (2023-10-23T02:41:34Z)
Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer. The objective is to utilize the style of diverse templates for question generation. We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z)
Generating Personas for Games with Multimodal Adversarial Imitation Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. Going beyond reinforcement learning is necessary to model a wide range of human playstyles. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z)
Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies. We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z)
A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework. Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z)
Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity. The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z)
Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity [49.68758494467258]
We study how to construct diverse populations of agents by carefully structuring how individuals within a population interact. Our approach is based on interaction graphs, which control the flow of information between agents during training. We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
arXiv Detail & Related papers (2021-10-08T11:29:52Z)
Modelling Behavioural Diversity for Learning in Open-Ended Games [15.978932309579013]
We offer a geometric interpretation of behavioural diversity in games. We introduce a novel diversity metric based on emphdeterminantal point processes (DPP) We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games.
arXiv Detail & Related papers (2021-03-14T13:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.