Multi-Agent Path Finding via Offline RL and LLM Collaboration
- URL: http://arxiv.org/abs/2509.22130v1
- Date: Fri, 26 Sep 2025 09:53:40 GMT
- Title: Multi-Agent Path Finding via Offline RL and LLM Collaboration
- Authors: Merve Atasever, Matthew Hong, Mihir Nitin Kulkarni, Qingpei Li, Jyotirmoy V. Deshmukh,
- Abstract summary: Multi-Agent Path Finding (MAPF) poses a significant and challenging problem for applications in robotics and logistics.<n>We propose an efficient decentralized planning framework based on the Decision Transformer (DT)<n>Our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Agent Path Finding (MAPF) poses a significant and challenging problem critical for applications in robotics and logistics, particularly due to its combinatorial complexity and the partial observability inherent in realistic environments. Decentralized reinforcement learning methods commonly encounter two substantial difficulties: first, they often yield self-centered behaviors among agents, resulting in frequent collisions, and second, their reliance on complex communication modules leads to prolonged training times, sometimes spanning weeks. To address these challenges, we propose an efficient decentralized planning framework based on the Decision Transformer (DT), uniquely leveraging offline reinforcement learning to substantially reduce training durations from weeks to mere hours. Crucially, our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards. Furthermore, to overcome adaptability limitations inherent in standard RL methods under dynamic environmental changes, we integrate a large language model (GPT-4o) to dynamically guide agent policies. Extensive experiments in both static and dynamically changing environments demonstrate that our DT-based approach, augmented briefly by GPT-4o, significantly enhances adaptability and performance.
Related papers
- A Curriculum-Based Deep Reinforcement Learning Framework for the Electric Vehicle Routing Problem [0.4666493857924357]
Electric vehicle routing problem with time windows (EVRPTW) is a complex optimization problem in sustainable logistics.<n>We propose a curriculum-based deep reinforcement learning (CB-DRL) framework designed to resolve this instability.
arXiv Detail & Related papers (2026-01-21T14:42:33Z) - Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z) - Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments [2.116865312302264]
We propose a scalable, region-aware reinforcement learning framework for path planning in dynamic environments.<n>Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions.
arXiv Detail & Related papers (2025-11-19T09:48:44Z) - Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z) - Human-in-the-loop Online Rejection Sampling for Robotic Manipulation [55.99788088622936]
Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning.<n>Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training.
arXiv Detail & Related papers (2025-10-30T11:53:08Z) - Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications [0.0]
We propose an adaptive online learning unit integrated within the metascheduler to enhance performance in real-time.<n>In the online mode, Reinforcement Learning plays a pivotal role by continuously exploring and discovering new scheduling solutions.<n>Several RL models were implemented within the online learning unit, each designed to address specific challenges in scheduling.
arXiv Detail & Related papers (2025-09-24T19:46:22Z) - Large Language Model-Empowered Decision Transformer for UAV-Enabled Data Collection [71.84636717632206]
Unmanned aerial vehicles (UAVs) for reliable and energy-efficient data collection from spatially distributed devices holds great promise in supporting Internet of Things (IoT) applications.<n>We propose a joint language model (LLM) to learn effective UAV control policies.<n>LLM-CRDT outperforms benchmark online and offline methods, achieving up to 36.7% higher energy efficiency than current state-of-the-art DT approaches.
arXiv Detail & Related papers (2025-09-17T13:05:08Z) - Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems [57.692181589325116]
Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange.<n>Due to privacy concerns and regulatory constraints, raw data collected at remote clients cannot be centrally aggregated.<n>Federated learning offers a privacy-preserving alternative by training local models on distributed devices and exchanging only model parameters.<n>We propose a discrete temporal graph-based on-demand scheduling framework that dynamically allocates communication resources to accelerate federated learning.
arXiv Detail & Related papers (2025-09-05T03:33:42Z) - GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning [15.43938821214447]
Reinforcement Learning with Verifiable Rewards (RLVR) has recently emerged as a powerful paradigm for facilitating the self-improvement of large language models (LLMs)<n>This paper introduces Guided Hybrid Policy Optimization (GHPO), a novel difficulty-aware reinforcement learning framework.<n>GHPO dynamically calibrates task difficulty by employing adaptive prompt refinement to provide targeted guidance.
arXiv Detail & Related papers (2025-07-14T08:10:00Z) - Adaptive Resource Allocation Optimization Using Large Language Models in Dynamic Wireless Environments [25.866960634041092]
Current solutions rely on domain-specific architectures or techniques, and a general DL approach for constrained optimization remains undeveloped.<n>We propose a large language model for resource allocation (LLM-RAO) to address the complex resource allocation problem while adhering to constraints.<n>LLM-RAO achieves up to a 40% performance enhancement compared to conventional DL methods and up to an $80$% improvement over analytical approaches.
arXiv Detail & Related papers (2025-02-04T12:56:59Z) - Continual Task Learning through Adaptive Policy Self-Composition [54.95680427960524]
CompoFormer is a structure-based continual transformer model that adaptively composes previous policies via a meta-policy network.
Our experiments reveal that CompoFormer outperforms conventional continual learning (CL) methods, particularly in longer task sequences.
arXiv Detail & Related papers (2024-11-18T08:20:21Z) - Personalized Wireless Federated Learning for Large Language Models [75.22457544349668]
Large language models (LLMs) have driven profound transformations in wireless networks.<n>Within wireless environments, the training of LLMs faces significant challenges related to security and privacy.<n>This paper presents a systematic analysis of the training stages of LLMs in wireless networks, including pre-training, instruction tuning, and alignment tuning.
arXiv Detail & Related papers (2024-04-20T02:30:21Z) - P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer [49.716834343064015]
Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model.<n>We propose a novel solution - the Progressive Prompt Decision Transformer (P2DT)<n>This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies.
arXiv Detail & Related papers (2024-01-22T02:58:53Z) - Self-Sustaining Multiple Access with Continual Deep Reinforcement
Learning for Dynamic Metaverse Applications [17.436875530809946]
The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services.
To deal with such a dynamic and complex scenario, one potential approach is to adopt self-sustaining strategies.
This paper investigates the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent.
arXiv Detail & Related papers (2023-09-18T22:02:47Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.