Centralizing State-Values in Dueling Networks for Multi-Robot
Reinforcement Learning Mapless Navigation
- URL: http://arxiv.org/abs/2112.09012v1
- Date: Thu, 16 Dec 2021 16:47:00 GMT
- Title: Centralizing State-Values in Dueling Networks for Multi-Robot
Reinforcement Learning Mapless Navigation
- Authors: Enrico Marchesini, Alessandro Farinelli
- Abstract summary: We study the problem of multi-robot mapless navigation in the popular Training and Decentralized Execution (CTDE) paradigm.
This problem is challenging when each robot considers its path without explicitly sharing observations with other robots.
We propose a novel architecture for CTDE that uses a centralized state-value network to compute a joint state-value.
- Score: 87.85646257351212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of multi-robot mapless navigation in the popular
Centralized Training and Decentralized Execution (CTDE) paradigm. This problem
is challenging when each robot considers its path without explicitly sharing
observations with other robots and can lead to non-stationary issues in Deep
Reinforcement Learning (DRL). The typical CTDE algorithm factorizes the joint
action-value function into individual ones, to favor cooperation and achieve
decentralized execution. Such factorization involves constraints (e.g.,
monotonicity) that limit the emergence of novel behaviors in an individual as
each agent is trained starting from a joint action-value. In contrast, we
propose a novel architecture for CTDE that uses a centralized state-value
network to compute a joint state-value, which is used to inject global state
information in the value-based updates of the agents. Consequently, each model
computes its gradient update for the weights, considering the overall state of
the environment. Our idea follows the insights of Dueling Networks as a
separate estimation of the joint state-value has both the advantage of
improving sample efficiency, while providing each robot information whether the
global state is (or is not) valuable. Experiments in a robotic navigation task
with 2 4, and 8 robots, confirm the superior performance of our approach over
prior CTDE methods (e.g., VDN, QMIX).
Related papers
- Generalizability of Graph Neural Networks for Decentralized Unlabeled Motion Planning [72.86540018081531]
Unlabeled motion planning involves assigning a set of robots to target locations while ensuring collision avoidance.
This problem forms an essential building block for multi-robot systems in applications such as exploration, surveillance, and transportation.
We address this problem in a decentralized setting where each robot knows only the positions of its $k$-nearest robots and $k$-nearest targets.
arXiv Detail & Related papers (2024-09-29T23:57:25Z) - Attention Graph for Multi-Robot Social Navigation with Deep
Reinforcement Learning [0.0]
We present MultiSoc, a new method for learning multi-agent socially aware navigation strategies using deep reinforcement learning (RL)
Inspired by recent works on multi-agent deep RL, our method leverages graph-based representation of agent interactions, combining the positions and fields of view of entities (pedestrians and agents)
Our method learns faster than social navigation deep RL mono-agent techniques, and enables efficient multi-agent implicit coordination in challenging crowd navigation with multiple heterogeneous humans.
arXiv Detail & Related papers (2024-01-31T15:24:13Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - Decentralized Multi-Agent Reinforcement Learning with Global State
Prediction [3.5843971648706296]
A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently.
We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents.
In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport.
In the second approach, we introduce Global State Prediction (GSP), a network trained to forma a belief over the swarm as a whole and predict its future states.
arXiv Detail & Related papers (2023-06-22T14:38:12Z) - Distributed Reinforcement Learning for Robot Teams: A Review [10.92709534981466]
Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems.
Community has leveraged model-free multi-agent reinforcement learning to devise efficient, scalable controllers for multi-robot systems.
Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability.
arXiv Detail & Related papers (2022-04-07T15:34:19Z) - CTDS: Centralized Teacher with Decentralized Student for Multi-Agent
Reinforcement Learning [114.69155066932046]
This work proposes a novel.
Teacher with Decentralized Student (C TDS) framework, which consists of a teacher model and a student model.
Specifically, the teacher model allocates the team reward by learning individual Q-values conditioned on global observation.
The student model utilizes the partial observations to approximate the Q-values estimated by the teacher model.
arXiv Detail & Related papers (2022-03-16T06:03:14Z) - Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z) - Learning Connectivity for Data Distribution in Robot Teams [96.39864514115136]
We propose a task-agnostic, decentralized, low-latency method for data distribution in ad-hoc networks using Graph Neural Networks (GNN)
Our approach enables multi-agent algorithms based on global state information to function by ensuring it is available at each robot.
We train the distributed GNN communication policies via reinforcement learning using the average Age of Information as the reward function and show that it improves training stability compared to task-specific reward functions.
arXiv Detail & Related papers (2021-03-08T21:48:55Z) - Robot Navigation in a Crowd by Integrating Deep Reinforcement Learning
and Online Planning [8.211771115758381]
It is still an open and challenging problem for mobile robots navigating along time-efficient and collision-free paths in a crowd.
Deep reinforcement learning is a promising solution to this problem.
We propose a graph-based deep reinforcement learning method, SG-DQN.
Our model can help the robot better understand the crowd and achieve a high success rate of more than 0.99 in the crowd navigation task.
arXiv Detail & Related papers (2021-02-26T02:17:13Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.