Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations
for Flocking Control
- URL: http://arxiv.org/abs/2209.08351v1
- Date: Sat, 17 Sep 2022 15:24:37 GMT
- Title: Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations
for Flocking Control
- Authors: Yunbo Qiu, Yuzhu Zhan, Yue Jin, Jian Wang, Xudong Zhang
- Abstract summary: Flocking control is a significant problem in multi-agent systems such as unmanned aerial vehicles and autonomous underwater vehicles.
In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly.
We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents.
- Score: 6.398557794102739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Flocking control is a significant problem in multi-agent systems such as
multi-agent unmanned aerial vehicles and multi-agent autonomous underwater
vehicles, which enhances the cooperativity and safety of agents. In contrast to
traditional methods, multi-agent reinforcement learning (MARL) solves the
problem of flocking control more flexibly. However, methods based on MARL
suffer from sample inefficiency, since they require a huge number of
experiences to be collected from interactions between agents and the
environment. We propose a novel method Pretraining with Demonstrations for MARL
(PwD-MARL), which can utilize non-expert demonstrations collected in advance
with traditional methods to pretrain agents. During the process of pretraining,
agents learn policies from demonstrations by MARL and behavior cloning
simultaneously, and are prevented from overfitting demonstrations. By
pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency
in the process of online MARL with a warm start. Experiments show that PwD-MARL
improves sample efficiency and policy performance in the problem of flocking
control, even with bad or few demonstrations.
Related papers
- Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team.
These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements.
We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning.
We propose MADiff, a novel generative multi-agent learning framework to tackle this problem.
Our experiments show the superior performance of MADiff compared to baseline algorithms in a wide range of multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing
RL Safety [0.0]
We propose a task-agnostic method that leverages small sets of safe and unsafe demonstrations to improve the safety of RL agents during learning.
We evaluate our method on three tasks from OpenAI Gym's Mujoco benchmark and two state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-05-08T14:23:27Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - Efficient Reinforcement Learning from Demonstration Using Local Ensemble
and Reparameterization with Split and Merge of Expert Policies [7.126594773940676]
Policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions.
We propose a new method called Local Ensemble and Re parameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations.
We demonstrate the superiority of the LEARN-SAM method and its robustness with varying demonstration quality and sparsity in six experiments on complex continuous control problems of low to high dimensions.
arXiv Detail & Related papers (2022-05-23T03:36:24Z) - Relative Distributed Formation and Obstacle Avoidance with Multi-agent
Reinforcement Learning [20.401609420707867]
We propose a distributed formation and obstacle avoidance method based on multi-agent reinforcement learning (MARL)
Our method achieves better performance regarding formation error, formation convergence rate and on-par success rate of obstacle avoidance compared with baselines.
arXiv Detail & Related papers (2021-11-14T13:02:45Z) - Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting
Pot [71.28884625011987]
Melting Pot is a MARL evaluation suite that uses reinforcement learning to reduce the human labor required to create novel test scenarios.
We have created over 80 unique test scenarios covering a broad range of research topics.
We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
arXiv Detail & Related papers (2021-07-14T17:22:14Z) - SAFARI: Safe and Active Robot Imitation Learning with Imagination [16.967930721746676]
SAFARI is a novel active learning and control algorithm.
It allows an agent to request further human demonstrations when these out-of-distribution situations are met.
We show how this method enables the agent to autonomously predict failure rapidly and safely.
arXiv Detail & Related papers (2020-11-18T23:43:59Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.