Decentralized Multi-Agent Reinforcement Learning with Global State
Prediction
- URL: http://arxiv.org/abs/2306.12926v2
- Date: Mon, 28 Aug 2023 17:33:56 GMT
- Title: Decentralized Multi-Agent Reinforcement Learning with Global State
Prediction
- Authors: Joshua Bloom, Pranjal Paliwal, Apratim Mukherjee, Carlo Pinciroli
- Abstract summary: A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently.
We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents.
In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport.
In the second approach, we introduce Global State Prediction (GSP), a network trained to forma a belief over the swarm as a whole and predict its future states.
- Score: 3.5843971648706296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning (DRL) has seen remarkable success in the control
of single robots. However, applying DRL to robot swarms presents significant
challenges. A critical challenge is non-stationarity, which occurs when two or
more robots update individual or shared policies concurrently, thereby engaging
in an interdependent training process with no guarantees of convergence.
Circumventing non-stationarity typically involves training the robots with
global information about other agents' states and/or actions. In contrast, in
this paper we explore how to remove the need for global information. We pose
our problem as a Partially Observable Markov Decision Process, due to the
absence of global knowledge on other agents. Using collective transport as a
testbed scenario, we study two approaches to multi-agent training. In the
first, the robots exchange no messages, and are trained to rely on implicit
communication through push-and-pull on the object to transport. In the second
approach, we introduce Global State Prediction (GSP), a network trained to
forma a belief over the swarm as a whole and predict its future states. We
provide a comprehensive study over four well-known deep reinforcement learning
algorithms in environments with obstacles, measuring performance as the
successful transport of the object to the goal within a desired time-frame.
Through an ablation study, we show that including GSP boosts performance and
increases robustness when compared with methods that use global knowledge.
Related papers
- A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics [53.33976793493801]
We organized the Robot Air Hockey Challenge at the NeurIPS 2023 conference.
We focus on practical challenges in robotics, such as the sim-to-real gap, low-level control issues, safety problems, real-time requirements, and the limited availability of real-world data.
Results show that solutions combining learning-based approaches with prior knowledge outperform those relying solely on data when real-world deployment is challenging.
arXiv Detail & Related papers (2024-11-08T17:20:47Z) - Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning [72.86540018081531]
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance.
This problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation.
We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets.
arXiv Detail & Related papers (2024-09-29T23:57:25Z) - Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input [5.522800137785975]
We introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework.
The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy.
By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.
arXiv Detail & Related papers (2024-09-20T12:27:47Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Reinforcement Learning for UAV control with Policy and Reward Shaping [0.7127008801193563]
This study teaches an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously.
The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach.
arXiv Detail & Related papers (2022-12-06T14:46:13Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Centralizing State-Values in Dueling Networks for Multi-Robot
Reinforcement Learning Mapless Navigation [87.85646257351212]
We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm.
This problem is challenging when each robot considers its path without explicitly sharing observations with other robots.
We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
arXiv Detail & Related papers (2021-12-16T16:47:00Z) - Learning Connectivity for Data Distribution in Robot Teams [96.39864514115136]
We propose a task-agnostic, decentralized, low-latency method for data distribution in ad-hoc networks using Graph Neural Networks (GNN)
Our approach enables multi-agent algorithms based on global state information to function by ensuring it is available at each robot.
We train the distributed GNN communication policies via reinforcement learning using the average Age of Information as the reward function and show that it improves training stability compared to task-specific reward functions.
arXiv Detail & Related papers (2021-03-08T21:48:55Z) - Deep Adversarial Reinforcement Learning for Object Disentangling [36.66974848126079]
We present a novel adversarial reinforcement learning (ARL) framework for disentangling waste objects.
The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states.
We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task.
arXiv Detail & Related papers (2020-03-08T13:20:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.