Related papers: Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning

Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning

URL: http://arxiv.org/abs/2502.17813v2
Date: Fri, 07 Mar 2025 02:09:32 GMT
Title: Safe Multi-Agent Navigation guided by Goal-Conditioned Safe Reinforcement Learning
Authors: Meng Feng, Viraj Parimi, Brian Williams,
Abstract summary: We introduce a novel method that integrates the strengths of both planning and safe RL.<n>Our method prunes unsafe edges and generates a waypoint-based plan that the agent follows until reaching its goal.<n>In particular, we leverage Conflict-Based Search (CBS) to create waypoint-based plans for multiple agents allowing for their safe navigation over extended horizons.
Score: 2.082168997977094
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safe navigation is essential for autonomous systems operating in hazardous environments. Traditional planning methods excel at long-horizon tasks but rely on a predefined graph with fixed distance metrics. In contrast, safe Reinforcement Learning (RL) can learn complex behaviors without relying on manual heuristics but fails to solve long-horizon tasks, particularly in goal-conditioned and multi-agent scenarios. In this paper, we introduce a novel method that integrates the strengths of both planning and safe RL. Our method leverages goal-conditioned RL and safe RL to learn a goal-conditioned policy for navigation while concurrently estimating cumulative distance and safety levels using learned value functions via an automated self-training algorithm. By constructing a graph with states from the replay buffer, our method prunes unsafe edges and generates a waypoint-based plan that the agent follows until reaching its goal, effectively balancing faster and safer routes over extended distances. Utilizing this unified high-level graph and a shared low-level goal-conditioned safe RL policy, we extend this approach to address the multi-agent safe navigation problem. In particular, we leverage Conflict-Based Search (CBS) to create waypoint-based plans for multiple agents allowing for their safe navigation over extended horizons. This integration enhances the scalability of goal-conditioned safe RL in multi-agent scenarios, enabling efficient coordination among agents. Extensive benchmarking against state-of-the-art baselines demonstrates the effectiveness of our method in achieving distance goals safely for multiple agents in complex and hazardous environments. Our code and further details about or work is available at https://safe-visual-mapf-mers.csail.mit.edu/.

Related papers

Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation [0.0]
Hybrid Motion Planning with Deep Reinforcement Learning (HMP-DRL)<n>We propose a graph-based global planner to generate a path, which is integrated into a local DRL policy via a sequence of checkpoints encoded in both the state space and reward function.<n>To ensure social compliance, the local planner employs an entity-aware reward structure that dynamically adjusts safety margins and penalties based on the semantic type of surrounding agents.
arXiv Detail & Related papers (2025-12-31T05:58:57Z)
Risk-Bounded Multi-Agent Visual Navigation via Dynamic Budget Allocation [3.7347677698423536]
Traditional planning methods excel at solving long-horizon tasks but rely on predefined distance metrics.<n>We propose RB-CBS, a novel extension to Conflict-Based Search (CBS) that dynamically allocates and adjusts user-specified risk bound.<n>Our improved planner ensures that each agent receives a local risk budget enabling more efficient navigation while still respecting overall safety constraints.
arXiv Detail & Related papers (2025-09-09T21:35:55Z)
DAgger Diffusion Navigation: DAgger Boosted Diffusion Policy for Vision-Language Navigation [73.80968452950854]
Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to follow natural language instructions through free-form 3D spaces.<n>Existing VLN-CE approaches typically use a two-stage waypoint planning framework.<n>We propose DAgger Diffusion Navigation (DifNav) as an end-to-end optimized VLN-CE policy.
arXiv Detail & Related papers (2025-08-13T02:51:43Z)
Multi-agent Path Finding for Timed Tasks using Evolutionary Games [1.3023548510259344]
We show that our algorithm is faster than deep RL methods by at least an order of magnitude. Our results indicate that it scales better with an increase in the number of agents as compared to other methods.
arXiv Detail & Related papers (2024-11-15T20:10:25Z)
Safe Policy Exploration Improvement via Subgoals [44.07721205323709]
Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups. One of the main reasons for poor performance in such setups is that the need to respect the safety constraints degrades the exploration capabilities of an RL agent. We introduce a novel learnable algorithm that is based on decomposing the initial problem into smaller sub-problems via intermediate goals.
arXiv Detail & Related papers (2024-08-25T16:12:49Z)
Offline Goal-Conditioned Reinforcement Learning for Safety-Critical Tasks with Recovery Policy [4.854443247023496]
offline goal-conditioned reinforcement learning (GCRL) aims at solving goal-reaching tasks with sparse rewards from an offline dataset. We propose a new method called Recovery-based Supervised Learning (RbSL) to accomplish safety-critical tasks with various goals.
arXiv Detail & Related papers (2024-03-04T05:20:57Z)
Safeguarded Progress in Reinforcement Learning: Safe Bayesian Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL) We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z)
A Multiplicative Value Function for Safe and Efficient Reinforcement Learning [131.96501469927733]
We propose a safe model-free RL algorithm with a novel multiplicative value function consisting of a safety critic and a reward critic. The safety critic predicts the probability of constraint violation and discounts the reward critic that only estimates constraint-free returns. We evaluate our method in four safety-focused environments, including classical RL benchmarks augmented with safety constraints and robot navigation tasks with images and raw Lidar scans as observations.
arXiv Detail & Related papers (2023-03-07T18:29:15Z)
Reinforcement Learning-Based Air Traffic Deconfliction [7.782300855058585]
This work focuses on automating the horizontal separation of two aircraft and presents the obstacle avoidance problem as a 2D surrogate optimization task. Using Reinforcement Learning (RL), we optimize the avoidance policy and model the dynamics, interactions, and decision-making. The proposed system generates a quick and achievable avoidance trajectory that satisfies the safety requirements.
arXiv Detail & Related papers (2023-01-05T00:37:20Z)
Safety Correction from Baseline: Towards the Risk-aware Policy in Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent. Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control. The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z)
Evaluating Model-free Reinforcement Learning toward Safety-critical Tasks [70.76757529955577]
This paper revisits prior work in this scope from the perspective of state-wise safe RL. We propose Unrolling Safety Layer (USL), a joint method that combines safety optimization and safety projection. To facilitate further research in this area, we reproduce related algorithms in a unified pipeline and incorporate them into SafeRL-Kit.
arXiv Detail & Related papers (2022-12-12T06:30:17Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Benchmarking Safe Deep Reinforcement Learning in Aquatic Navigation [78.17108227614928]
We propose a benchmark environment for Safe Reinforcement Learning focusing on aquatic navigation. We consider a value-based and policy-gradient Deep Reinforcement Learning (DRL) We also propose a verification strategy that checks the behavior of the trained models over a set of desired properties.
arXiv Detail & Related papers (2021-12-16T16:53:56Z)
Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunov-based approach [2.741266294612776]
We propose a model-free safety specification method that learns the maximal probability of safe operation. Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. It yields a sequence of safe policies that determine the range of safe operation, called the safe set.
arXiv Detail & Related papers (2020-02-24T09:20:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.