Related papers: Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse

URL: http://arxiv.org/abs/2509.23778v2
Date: Tue, 30 Sep 2025 12:39:02 GMT
Title: Sequence Pathfinder for Multi-Agent Pickup and Delivery in the Warehouse
Authors: Zeyuan Zhao, Chaoran Li, Shao Zhang, Ying Wen,
Abstract summary: Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF)<n> Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication.<n>We propose the Sequential Pathfinder (SePar) to achieve implicit information exchange, reducing decision-making complexity from exponential to linear.
Score: 10.576983033957953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.

Related papers

PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception [12.114711272142031]
PC2P is a novel distributed MAPF method derived from a Q-learning-based MARL framework.<n>We introduce a personalized-enhanced communication mechanism based on dynamic graph topology.<n>To resolve extreme deadlock issues, we propose a region-based deadlock-breaking strategy.
arXiv Detail & Related papers (2026-01-06T03:11:26Z)
Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems [17.3780399150554]
This paper proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT)<n>MAPT significantly outperforms existing baseline methods in terms of performance and substantial computational time advantages compared to classical operations research methods.
arXiv Detail & Related papers (2025-11-21T17:32:10Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents [56.625878022978945]
Large Language Models (LLMs) as autonomous agents are increasingly tasked with solving complex, long-horizon problems.<n>Direct Preference Optimization (DPO) provides a signal that is too coarse for precise credit assignment, while step-level DPO is often too myopic to capture the value of multi-step behaviors.<n>We introduce Hierarchical Preference Learning (HPL), a hierarchical framework that optimize LLM agents by leveraging preference signals at multiple, synergistic granularities.
arXiv Detail & Related papers (2025-09-26T08:43:39Z)
MAPF-World: Action World Model for Multi-Agent Path Finding [17.847921829680576]
Multi-agent path finding (MAPF) is the problem of planning conflict-free paths from the designated start locations to goal positions for multiple agents.<n>Recent decentralized learnable solvers have shown great promise for large-scale MAPF.<n>We propose MAPF-World, an autoregressive action world model for MAPF that unifies situation understanding and action generation.
arXiv Detail & Related papers (2025-08-16T15:50:26Z)
Combining Planning and Reinforcement Learning for Solving Relational Multiagent Domains [16.56659112347106]
Multiagent Reinforcement Learning (MARL) poses significant challenges due to the exponential growth of state and action spaces.<n>We propose integrating relational planners as centralized controllers with efficient state abstractions and reinforcement learning.
arXiv Detail & Related papers (2025-02-26T16:55:23Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.<n>However, they still struggle with problems requiring multi-step decision-making and environmental feedback.<n>We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models [57.45019514036948]
Multi-Agent Path Finding (MAPF) is a fundamental problem in robotics.<n>This work proposes a novel approach that integrates constrained optimization with diffusion models for MAPF in continuous spaces.
arXiv Detail & Related papers (2024-12-23T21:27:19Z)
Modality Prompts for Arbitrary Modality Salient Object Detection [57.610000247519196]
This paper delves into the task of arbitrary modality salient object detection (AM SOD) It aims to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD.
arXiv Detail & Related papers (2024-05-06T11:02:02Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Model-free Motion Planning of Autonomous Agents for Complex Tasks in Partially Observable Environments [3.7660066212240753]
Motion planning of autonomous agents in partially known environments is a challenging problem. This paper proposes a model-free reinforcement learning approach to address this problem. We show that our proposed method effectively addresses environment, action, and observation uncertainties.
arXiv Detail & Related papers (2023-04-30T19:57:39Z)
Pruning Self-attentions into Convolutional Layers in Single Path [89.55361659622305]
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks. We propose Single-Path Vision Transformer pruning (SPViT) to efficiently and automatically compress the pre-trained ViTs. Our SPViT can trim 52.0% FLOPs for DeiT-B and get an impressive 0.6% top-1 accuracy gain simultaneously.
arXiv Detail & Related papers (2021-11-23T11:35:54Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.