Related papers: Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

URL: http://arxiv.org/abs/2401.11378v1
Date: Sun, 21 Jan 2024 03:01:00 GMT
Title: Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance
Authors: Zheng Fang, Tianhao Chen, Dong Jiang, Zheng Zhang and Guangliang Li
Abstract summary: This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL) Our experimental results in a multi-AUV formation control and obstacle avoidance task show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations.
Score: 10.834762022842353
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multiple autonomous underwater vehicles (multi-AUV) can cooperatively accomplish tasks that a single AUV cannot complete. Recently, multi-agent reinforcement learning has been introduced to control of multi-AUV. However, designing efficient reward functions for various tasks of multi-AUV control is difficult or even impractical. Multi-agent generative adversarial imitation learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal demonstrations and not surpassing provided expert demonstrations. This paper builds upon the MAGAIL algorithm by proposing multi-agent generative adversarial interactive self-imitation learning (MAGAISIL), which can facilitate AUVs to learn policies by gradually replacing the provided sub-optimal demonstrations with self-generated good trajectories selected by a human trainer. Our experimental results in a multi-AUV formation control and obstacle avoidance task on the Gazebo platform with AUV simulator of our lab show that AUVs trained via MAGAISIL can surpass the provided sub-optimal expert demonstrations and reach a performance close to or even better than MAGAIL with optimal demonstrations. Further results indicate that AUVs' policies trained via MAGAISIL can adapt to complex and different tasks as well as MAGAIL learning from optimal demonstrations.

Related papers

How to Train a Leader: Hierarchical Reasoning in Multi-Agent LLMs [16.853362180877593]
We introduce a hierarchical multi-agent framework that trains only a single leader LLM to coordinate a team of untrained peer agents.<n>Our results highlight the effectiveness and efficiency of training a single, flexible leader for collaborative reasoning in multi-agent LLM systems.
arXiv Detail & Related papers (2025-07-11T18:34:07Z)
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning [54.73049408950049]
We propose a Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning.<n>Our approach improves unified multimodal retrieval from both structural and learning perspectives.
arXiv Detail & Related papers (2025-07-10T16:47:25Z)
Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations. We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR) ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z)
Sample-efficient Unsupervised Policy Cloning from Ensemble Self-supervised Labeled Videos [7.827978803804189]
Unsupervised Policy from Ensemble Self-supervised labeled Videos (SV) is a novel framework to efficiently learn policies from action-free videos without rewards and any other supervision. SV trains a video labeling model to infer the expert actions in expert videos through several combined self-supervised tasks. After a sample-efficient, unsupervised, and iterative training process, SV obtains an advanced policy based on a robust video labeling model.
arXiv Detail & Related papers (2024-12-14T10:12:22Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe (Reinforced Learning) is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently. Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models. Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z)
Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team. These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements. We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks. We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation. We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z)
Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control [6.398557794102739]
Flocking control is a significant problem in multi-agent systems such as unmanned aerial vehicles and autonomous underwater vehicles. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents.
arXiv Detail & Related papers (2022-09-17T15:24:37Z)
GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback [6.367592686247906]
We propose GAN-Based Interactive Reinforcement Learning (GAIRL) from demonstration and human evaluative feedback. We tested our proposed method in six physics-based control tasks.
arXiv Detail & Related papers (2021-04-14T02:58:51Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)
Forgetful Experience Replay in Hierarchical Reinforcement Learning from Demonstrations [55.41644538483948]
In this paper, we propose a combination of approaches that allow the agent to use low-quality demonstrations in complex vision-based environments. Our proposed goal-oriented structuring of replay buffer allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. The solution based on our algorithm beats all the solutions for the famous MineRL competition and allows the agent to mine a diamond in the Minecraft environment.
arXiv Detail & Related papers (2020-06-17T15:38:40Z)
Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations. Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z)
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive. We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.