Policy Diversity for Cooperative Agents
- URL: http://arxiv.org/abs/2308.14308v1
- Date: Mon, 28 Aug 2023 05:23:16 GMT
- Title: Policy Diversity for Cooperative Agents
- Authors: Mingxi Tan, Andong Tian and Ludovic Denoyer
- Abstract summary: Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
- Score: 8.689289576285095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard cooperative multi-agent reinforcement learning (MARL) methods aim to
find the optimal team cooperative policy to complete a task. However there may
exist multiple different ways of cooperating, which usually are very needed by
domain experts. Therefore, identifying a set of significantly different
policies can alleviate the task complexity for them. Unfortunately, there is a
general lack of effective policy diversity approaches specifically designed for
the multi-agent domain. In this work, we propose a method called
Moment-Matching Policy Diversity to alleviate this problem. This method can
generate different team policies to varying degrees by formalizing the
difference between team policies as the difference in actions of selected
agents in different policies. Theoretically, we show that our method is a
simple way to implement a constrained optimization problem that regularizes the
difference between two trajectory distributions by using the maximum mean
discrepancy. The effectiveness of our approach is demonstrated on a challenging
team-based shooter.
Related papers
- Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - DGPO: Discovering Multiple Strategies with Diversity-Guided Policy
Optimization [34.40615558867965]
We propose an on-policy algorithm that discovers multiple strategies for solving a given task.
Unlike prior work, it achieves this with a shared policy network trained over a single run.
Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks.
arXiv Detail & Related papers (2022-07-12T15:57:55Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Developing cooperative policies for multi-stage reinforcement learning
tasks [0.0]
Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning.
This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks.
arXiv Detail & Related papers (2022-05-11T01:31:04Z) - Asynchronous, Option-Based Multi-Agent Policy Gradient: A Conditional
Reasoning Approach [10.904610735933145]
Multi-agent policy gradient (MAPG) methods are commonly used to learn such policies.
In complex problems with large state and action spaces, it is advantageous to extend MAPG methods to use higher-level actions.
We propose a novel, conditional reasoning approach to address this problem.
arXiv Detail & Related papers (2022-03-29T22:02:28Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Contrastive Explanations for Comparing Preferences of Reinforcement
Learning Agents [16.605295052893986]
In complex tasks where the reward function is not straightforward, multiple reinforcement learning (RL) policies can be trained by adjusting the impact of individual objectives on reward function.
In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives.
We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents.
arXiv Detail & Related papers (2021-12-17T11:57:57Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.