Related papers: Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments

URL: http://arxiv.org/abs/2511.15284v1
Date: Wed, 19 Nov 2025 09:48:44 GMT
Title: Path Planning through Multi-Agent Reinforcement Learning in Dynamic Environments
Authors: Jonas De Maeyer, Hossein Yarahmadi, Moharram Challenger,
Abstract summary: We propose a scalable, region-aware reinforcement learning framework for path planning in dynamic environments.<n>Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions.
Score: 2.116865312302264
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Path planning in dynamic environments is a fundamental challenge in intelligent transportation and robotics, where obstacles and conditions change over time, introducing uncertainty and requiring continuous adaptation. While existing approaches often assume complete environmental unpredictability or rely on global planners, these assumptions limit scalability and practical deployment in real-world settings. In this paper, we propose a scalable, region-aware reinforcement learning (RL) framework for path planning in dynamic environments. Our method builds on the observation that environmental changes, although dynamic, are often localized within bounded regions. To exploit this, we introduce a hierarchical decomposition of the environment and deploy distributed RL agents that adapt to changes locally. We further propose a retraining mechanism based on sub-environment success rates to determine when policy updates are necessary. Two training paradigms are explored: single-agent Q-learning and multi-agent federated Q-learning, where local Q-tables are aggregated periodically to accelerate the learning process. Unlike prior work, we evaluate our methods in more realistic settings, where multiple simultaneous obstacle changes and increasing difficulty levels are present. Results show that the federated variants consistently outperform their single-agent counterparts and closely approach the performance of A* Oracle while maintaining shorter adaptation times and robust scalability. Although initial training remains time-consuming in large environments, our decentralized framework eliminates the need for a global planner and lays the groundwork for future improvements using deep RL and flexible environment decomposition.

Related papers

Sample-Efficient Neurosymbolic Deep Reinforcement Learning [49.60927398960061]
We propose a neuro-symbolic Deep RL approach that integrates background symbolic knowledge to improve sample efficiency.<n>Online reasoning is performed to guide the training process through two mechanisms.<n>We show improved performance over a state-of-the-art reward machine baseline.
arXiv Detail & Related papers (2026-01-06T09:28:53Z)
Grounded Test-Time Adaptation for LLM Agents [75.62784644919803]
Large language model (LLM)-based agents struggle to generalize to novel and complex environments.<n>We propose two strategies for adapting LLM agents by leveraging environment-specific information available during deployment.
arXiv Detail & Related papers (2025-11-06T22:24:35Z)
Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications [0.0]
We propose an adaptive online learning unit integrated within the metascheduler to enhance performance in real-time.<n>In the online mode, Reinforcement Learning plays a pivotal role by continuously exploring and discovering new scheduling solutions.<n>Several RL models were implemented within the online learning unit, each designed to address specific challenges in scheduling.
arXiv Detail & Related papers (2025-09-24T19:46:22Z)
DyPNIPP: Predicting Environment Dynamics for RL-based Robust Informative Path Planning [13.462524685985818]
DyPNIPP is a robust RL-based IPP framework designed to effectively acrosstemporal environments. Our experiments in a wildfire environment demonstrate that DyPNIPP outperforms existing RL-based IPP algorithms.
arXiv Detail & Related papers (2024-10-22T17:07:26Z)
Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts [0.15889427269227555]
We develop an adaptive re-training algorithm inspired by evolutionary game theory (EGT) ERPO shows faster policy adaptation, higher average rewards, and reduced computational costs in policy adaptation.
arXiv Detail & Related papers (2024-10-22T09:29:53Z)
Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations [22.6449779859417]
General intelligence requires quick adaption across tasks.<n>In this paper, we explore a wider range of scenarios where not only the distribution but also the environment spaces may change.<n>We introduce a causality-guided self-adaptive representation-based approach, called CSR, that equips the agent to generalize effectively.
arXiv Detail & Related papers (2024-07-30T08:48:49Z)
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z)
Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments [89.04823188871906]
Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments. Most of the existing simulators rely on randomly generating the environments. We introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers.
arXiv Detail & Related papers (2021-06-18T21:49:46Z)
Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes. Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z)
Learning to Continuously Optimize Wireless Resource In Episodically Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment. We propose to build the notion of continual learning into the modeling process of learning wireless systems. Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z)
MAPPER: Multi-Agent Path Planning with Evolutionary Reinforcement Learning in Mixed Dynamic Environments [30.407700996710023]
This paper proposes a decentralized partially observable multi-agent path planning with evolutionary reinforcement learning (MAPPER) method. We decompose the long-range navigation task into many easier sub-tasks under the guidance of a global planner. Our approach models dynamic obstacles' behavior with an image-based representation and trains a policy in mixed dynamic environments without homogeneity assumption.
arXiv Detail & Related papers (2020-07-30T20:14:42Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.