Multi-Agent Generative Adversarial Interactive Self-Imitation Learning
for AUV Formation Control and Obstacle Avoidance
- URL: http://arxiv.org/abs/2401.11378v1
- Date: Sun, 21 Jan 2024 03:01:00 GMT
- Title: Multi-Agent Generative Adversarial Interactive Self-Imitation Learning
for AUV Formation Control and Obstacle Avoidance
- Authors: Zheng Fang, Tianhao Chen, Dong Jiang, Zheng Zhang and Guangliang Li
- Abstract summary: This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL)
Our experimental results in a multi-AUV formation control and obstacle avoidance task show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations.
- Score: 10.834762022842353
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multiple autonomous underwater vehicles (multi-AUV) can cooperatively
accomplish tasks that a single AUV cannot complete. Recently, multi-agent
reinforcement learning has been introduced to control of multi-AUV. However,
designing efficient reward functions for various tasks of multi-AUV control is
difficult or even impractical. Multi-agent generative adversarial imitation
learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead
of pre-defined reward functions, but suffers from the deficiency of requiring
optimal demonstrations and not surpassing provided expert demonstrations. This
paper builds upon the MAGAIL algorithm by proposing multi-agent generative
adversarial interactive self-imitation learning (MAGAISIL), which can
facilitate AUVs to learn policies by gradually replacing the provided
sub-optimal demonstrations with self-generated good trajectories selected by a
human trainer. Our experimental results in a multi-AUV formation control and
obstacle avoidance task on the Gazebo platform with AUV simulator of our lab
show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert
demonstrations and reach a performance close to or even better than MAGAIL with
optimal demonstrations. Further results indicate that AUVs' policies trained
via MAGAISIL can adapt to complex and different tasks as well as MAGAIL
learning from optimal demonstrations.
Related papers
- Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations.
We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR)
ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z) - Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos [4.6949816706255065]
Current advanced policy learning methodologies have demonstrated the ability to develop expert-level strategies when provided enough information.
Humans can efficiently acquire skills within a few trials and errors by imitating easily accessible internet video, in the absence of any other supervision.
In this paper, we try to let machines replicate this efficient watching-and-learning process through Unsupervised Policy from Ensemble Self-supervised labeled Videos.
arXiv Detail & Related papers (2024-12-14T10:12:22Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team.
These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements.
We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z) - Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages.
Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks.
We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation.
We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z) - Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations
for Flocking Control [6.398557794102739]
Flocking control is a significant problem in multi-agent systems such as unmanned aerial vehicles and autonomous underwater vehicles.
In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly.
We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents.
arXiv Detail & Related papers (2022-09-17T15:24:37Z) - GAN-Based Interactive Reinforcement Learning from Demonstration and
Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback.
We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Forgetful Experience Replay in Hierarchical Reinforcement Learning from
Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments.
Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations.
The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.