Measuring Policy Distance for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2401.11257v2
- Date: Sun, 28 Jan 2024 15:37:54 GMT
- Title: Measuring Policy Distance for Multi-Agent Reinforcement Learning
- Authors: Tianyi Hu, Zhiqiang Pu, Xiaolin Ai, Tenghai Qiu, Jianqiang Yi
- Abstract summary: We propose the multi-agent policy distance (MAPD), a tool for measuring policy differences in multi-agent reinforcement learning (MARL)
By learning the conditional representations of agents' decisions, MAPD can compute the policy distance between any pair of agents.
We also extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects.
- Score: 9.80588687020087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diversity plays a crucial role in improving the performance of multi-agent
reinforcement learning (MARL). Currently, many diversity-based methods have
been developed to overcome the drawbacks of excessive parameter sharing in
traditional MARL. However, there remains a lack of a general metric to quantify
policy differences among agents. Such a metric would not only facilitate the
evaluation of the diversity evolution in multi-agent systems, but also provide
guidance for the design of diversity-based MARL algorithms. In this paper, we
propose the multi-agent policy distance (MAPD), a general tool for measuring
policy differences in MARL. By learning the conditional representations of
agents' decisions, MAPD can computes the policy distance between any pair of
agents. Furthermore, we extend MAPD to a customizable version, which can
quantify differences among agent policies on specified aspects. Based on the
online deployment of MAPD, we design a multi-agent dynamic parameter sharing
(MADPS) algorithm as an example of the MAPD's applications. Extensive
experiments demonstrate that our method is effective in measuring differences
in agent policies and specific behavioral tendencies. Moreover, in comparison
to other methods of parameter sharing, MADPS exhibits superior performance.
Related papers
- Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning [8.905920197601173]
We introduce Diversity Control (DiCo), a method able to control diversity to an exact value of a given metric.
We show how DiCo can be employed as a novel paradigm to increase performance and sample efficiency in Multi-Agent Reinforcement Learning.
arXiv Detail & Related papers (2024-05-23T21:03:33Z) - Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning [3.249853429482705]
Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves.
Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents.
We propose Off-Policy Action Anticipation (OffPA2), a novel framework that approaches learning anticipation through action anticipation.
arXiv Detail & Related papers (2023-04-04T01:44:19Z) - Parameter Sharing with Network Pruning for Scalable Multi-Agent Deep
Reinforcement Learning [20.35644044703191]
We propose a simple method that adopts structured pruning for a deep neural network to increase the representational capacity of the joint policy without introducing additional parameters.
We evaluate the proposed method on several benchmark tasks, and numerical results show that the proposed method significantly outperforms other parameter-sharing methods.
arXiv Detail & Related papers (2023-03-02T02:17:14Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Settling the Variance of Multi-Agent Policy Gradients [14.558011059649543]
Policy gradient (PG) methods are popular reinforcement learning (RL) methods.
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG methods degrades as the variance of gradient estimates increases rapidly with the number of agents.
We offer a rigorous analysis of MAPG methods by quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators.
We propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL.
arXiv Detail & Related papers (2021-08-19T10:49:10Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Semi-On-Policy Training for Sample Efficient Multi-Agent Policy
Gradients [51.749831824106046]
We introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods.
We show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.
arXiv Detail & Related papers (2021-04-27T19:37:01Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.