Related papers: Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding

Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding

URL: http://arxiv.org/abs/2403.07559v2
Date: Wed, 10 Jul 2024 08:36:48 GMT
Title: Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding
Authors: Huijie Tang, Federico Berto, Jinkyoo Park,
Abstract summary: Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. We propose a new method, Ensembling Prioritized Hybrid Policies (EPH)
Score: 18.06081009550052
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF. We open-source our code at https://github.com/ai4co/eph-mapf.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding [0.3867363075280544]
The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. We propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances.
arXiv Detail & Related papers (2025-02-14T15:42:30Z)
SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding [4.801673346687721]
The Multi-Agent Path Finding (MAPF) problem aims to determine the shortest and collision-free paths for multiple agents in a known, potentially obstacle-ridden environment. We introduce a new framework that applies sheaf theory to decentralized deep reinforcement learning, enabling agents to learn geometric cross-dependencies between each other. In particular, we incorporate a neural network to approximately model the consensus in latent space based on sheaf theory and train it through self-supervised learning.
arXiv Detail & Related papers (2025-02-10T13:17:34Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding [49.730902939565986]
Multi-Agent Pathfinding problem involves finding a set of conflict-free paths for a group of agents confined to a graph. In this study, we focus on the decentralized MAPF setting, where the agents may observe the other agents only locally. We propose a decentralized multi-agent Monte Carlo Tree Search (MCTS) method for MAPF tasks.
arXiv Detail & Related papers (2023-12-26T06:57:22Z)
SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding [3.4260993997836753]
We propose a novel multi-agent actor-critic method called Soft Actor-Critic with Heuristic-Based Attention (SACHA) SACHA learns a neural network for each agent to selectively pay attention to the shortest path guidance from multiple agents within its field of view. We demonstrate decent improvements over several state-of-the-art learning-based MAPF methods with respect to success rate and solution quality.
arXiv Detail & Related papers (2023-07-05T23:36:33Z)
Multi-Agent Path Finding with Prioritized Communication Learning [44.89255851944412]
We propose a PrIoritized COmmunication learning method (PICO), which incorporates the textitimplicit planning priorities into the communication topology. PICO performs significantly better in large-scale MAPF tasks in success rates and collision rates than state-of-the-art learning-based planners.
arXiv Detail & Related papers (2022-02-08T04:04:19Z)
MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO) We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Distributed Heuristic Multi-Agent Path Finding with Communication [7.854890646114447]
Multi-Agent Path Finding (MAPF) is essential to large-scale robotic systems. Recent methods have applied reinforcement learning (RL) to learn decentralized polices in partially observable environments. This paper combines communication with deep Q-learning to provide a novel learning based method for MAPF.
arXiv Detail & Related papers (2021-06-21T18:50:58Z)
Compilation-based Solvers for Multi-Agent Path Finding: a Survey, Discussion, and Future Opportunities [7.766921168069532]
We show the lessons learned from past developments and current trends in the topic and discuss its wider impact. Two major approaches to optimal MAPF solving include (1) dedicated search-based methods, which solve MAPF directly, and (2) compilation-based methods that reduce a MAPF instance to an instance in a different well established formalism.
arXiv Detail & Related papers (2021-04-23T20:13:12Z)
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games [67.47961797770249]
Multi-Agent PPO (MAPPO) is a multi-agent PPO variant which adopts a centralized value function. We show that MAPPO achieves performance comparable to the state-of-the-art in three popular multi-agent testbeds.
arXiv Detail & Related papers (2021-03-02T18:59:56Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.