RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2210.09646v1
- Date: Tue, 18 Oct 2022 07:32:43 GMT
- Title: RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning
- Authors: Wei Qiu, Xiao Ma, Bo An, Svetlana Obraztsova, Shuicheng Yan, Zhongwen
Xu
- Abstract summary: We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
- Score: 90.43925357575543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent advancement in multi-agent reinforcement learning (MARL),
the MARL agents easily overfit the training environment and perform poorly in
the evaluation scenarios where other agents behave differently. Obtaining
generalizable policies for MARL agents is thus necessary but challenging mainly
due to complex multi-agent interactions. In this work, we model the problem
with Markov Games and propose a simple yet effective method, ranked policy
memory (RPM), to collect diverse multi-agent trajectories for training MARL
policies with good generalizability. The main idea of RPM is to maintain a
look-up memory of policies. In particular, we try to acquire various levels of
behaviors by saving policies via ranking the training episode return, i.e., the
episode return of agents in the training environment; when an episode starts,
the learning agent can then choose a policy from the RPM as the behavior
policy. This innovative self-play training framework leverages agents' past
policies and guarantees the diversity of multi-agent interaction in the
training data. We implement RPM on top of MARL algorithms and conduct extensive
experiments on Melting Pot. It has been demonstrated that RPM enables MARL
agents to interact with unseen agents in multi-agent generalization evaluation
scenarios and complete given tasks, and it significantly boosts the performance
up to 402% on average.
Related papers
- PPS-QMIX: Periodically Parameter Sharing for Accelerating Convergence of
Multi-Agent Reinforcement Learning [20.746383793882984]
Training for multi-agent reinforcement learning(MARL) is a time-consuming process.
One drawback is that strategy of each agent in MARL is independent but actually in cooperation.
We propose three simple approaches called Average Sharing(A-PPS), Reward-Scalability Periodically and Partial Personalized Periodically.
arXiv Detail & Related papers (2024-03-05T03:59:01Z) - Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors.
We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning [12.170248966278281]
In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
arXiv Detail & Related papers (2021-08-30T04:30:53Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z) - Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning [0.2538209532048867]
This paper proposes a novel knowledge transfer framework in MARL, PAT (Parallel Attentional Transfer)
We design two acting modes in PAT, student mode and self-learning mode.
When agents are unfamiliar with the environment, the shared attention mechanism in student mode effectively selects learning knowledge from other agents to decide agents' actions.
arXiv Detail & Related papers (2020-03-29T17:42:00Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.