SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially
Observable Multi-Agent Path Finding
- URL: http://arxiv.org/abs/2307.02691v1
- Date: Wed, 5 Jul 2023 23:36:33 GMT
- Title: SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially
Observable Multi-Agent Path Finding
- Authors: Qiushi Lin, Hang Ma
- Abstract summary: We propose a novel multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA)
SACHA learns a neural network for each agent to selectively pay attention to the shortest path guidance from multiple agents within its field of view.
We demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.
- Score: 3.4260993997836753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Agent Path Finding (MAPF) is a crucial component for many large-scale
robotic systems, where agents must plan their collision-free paths to their
given goal positions. Recently, multi-agent reinforcement learning has been
introduced to solve the partially observable variant of MAPF by learning a
decentralized single-agent policy in a centralized fashion based on each
agent's partial observation. However, existing learning-based methods are
ineffective in achieving complex multi-agent cooperation, especially in
congested environments, due to the non-stationarity of this setting. To tackle
this challenge, we propose a multi-agent actor-critic method called Soft
Actor-Critic with Heuristic-Based Attention (SACHA), which employs novel
heuristic-based attention mechanisms for both the actors and critics to
encourage cooperation among agents. SACHA learns a neural network for each
agent to selectively pay attention to the shortest path heuristic guidance from
multiple agents within its field of view, thereby allowing for more scalable
learning of cooperation. SACHA also extends the existing multi-agent
actor-critic framework by introducing a novel critic centered on each agent to
approximate $Q$-values. Compared to existing methods that use a fully
observable critic, our agent-centered multi-agent actor-critic method results
in more impartial credit assignment and better generalizability of the learned
policy to MAPF instances with varying numbers of agents and types of
environments. We also implement SACHA(C), which embeds a communication module
in the agent's policy network to enable information exchange among agents. We
evaluate both SACHA and SACHA(C) on a variety of MAPF instances and demonstrate
decent improvements over several state-of-the-art learning-based MAPF methods
with respect to success rate and solution quality.
Related papers
- Effective Multi-Agent Deep Reinforcement Learning Control with Relative
Entropy Regularization [6.441951360534903]
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle the issues of limited capability and sample efficiency in various scenarios controlled by multiple agents.
It alleviates the inconsistency of multiple agents' policy updates by introducing the relative entropy regularization to the Training with Decentralized Execution (CTDE) framework with the Actor-Critic (AC) structure.
arXiv Detail & Related papers (2023-09-26T07:38:19Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Is Independent Learning All You Need in the StarCraft Multi-Agent
Challenge? [100.48692829396778]
Independent PPO (IPPO) is a form of independent learning in which each agent simply estimates its local value function.
IPPO's strong performance may be due to its robustness to some forms of environment non-stationarity.
arXiv Detail & Related papers (2020-11-18T20:29:59Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.