Related papers: Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

URL: http://arxiv.org/abs/2505.17659v3
Date: Fri, 26 Sep 2025 04:19:49 GMT
Title: Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
Authors: Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen,
Abstract summary: We propose a two-stage trajectory planning framework that decouples principle alignment from behavior learning.<n>Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.
Score: 74.41886258801209
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance. This two-stage paradigm retains human-like behaviors while enhancing safety awareness and discarding undesirable patterns from demonstrations. Furthermore, we identify a key limitation of directly applying GRPO to planning: group-wise normalization erases cross-group scale differences, causing rare, high-variance safety-violation groups to have similar advantages as abundant low-variance safe groups, thereby suppressing optimization for safety-critical objectives. To address this, we propose Variance-Decoupled GRPO (VD-GRPO), which replaces normalization with centering and fixed scaling to preserve absolute reward magnitudes, ensuring that safety-critical objectives remain dominant throughout training. Experiments on the nuPlan benchmark demonstrate that Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance, particularly in realistic reactive settings. Our code is available at https://github.com/XiaolongTang23/Plan-R1.

Related papers

RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving [5.030754278104693]
RAPiD is a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy.<n>To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller.
arXiv Detail & Related papers (2026-02-07T03:44:50Z)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models [62.16655896700062]
Activation steering is a technique to enhance the utility of Large Language Models (LLMs)<n>We show that it unintentionally introduces critical and under-explored safety risks.<n>Experiments reveal that these interventions act as a force multiplier, creating new vulnerabilities to jailbreaks and increasing attack success rates to over 80% on standard benchmarks.
arXiv Detail & Related papers (2026-02-03T12:32:35Z)
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations [0.0]
Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving.<n>The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future violations.<n>This paper investigates whether SRPL extends to real-world autonomous driving scenarios.
arXiv Detail & Related papers (2025-12-19T13:52:19Z)
SUPER-AD: Semantic Uncertainty-aware Planning for End-to-End Robust Autonomous Driving [36.91878828972102]
We propose a camera-only E2E framework that estimates aleatoric uncertainty directly in BEV space and incorporates it into planning.<n>Our method produces a dense, uncertainty-aware drivability map that captures both semantic structure and geometric layout at pixel-level resolution.
arXiv Detail & Related papers (2025-11-28T03:50:44Z)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Centaur: Robust End-to-End Autonomous Driving with Test-Time Training [84.78837437133234]
We propose Centaur, which updates a planner's behavior via test-time training without relying on hand-engineered rules or cost functions.<n>We develop a novel uncertainty measure, called Cluster Entropy, which is simple, interpretable, and compatible with state-of-the-art planning algorithms.
arXiv Detail & Related papers (2025-03-14T17:59:41Z)
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance [19.204115959760788]
We propose a novel transformer-based Diffusion Planner for closed-loop planning.<n>Our model supports joint modeling of both prediction and planning tasks.<n>It achieves state-of-the-art closed-loop performance with robust transferability in diverse driving styles.
arXiv Detail & Related papers (2025-01-26T15:49:50Z)
LHPF: Look back the History and Plan for the Future in Autonomous Driving [10.855426442780516]
This paper introduces LHPF, an imitation learning planner that integrates historical planning information. Our approach employs a historical intention aggregation module that pools historical planning intentions. Experiments using both real-world and synthetic data demonstrate that LHPF not only surpasses existing advanced learning-based planners in planning performance but also marks the first instance of a purely learning-based planner outperforming the expert.
arXiv Detail & Related papers (2024-11-26T09:30:26Z)
RuleFuser: An Evidential Bayes Approach for Rule Injection in Imitation Learned Planners and Predictors for Robustness under Distribution Shifts [20.405998427564764]
RuleFuser combines IL planners with classical rule-based planners to draw on the complementary benefits of both. Our approach, tested on the real-world nuPlan dataset, achieves a 38.43% average improvement on safety metrics over the IL planner.
arXiv Detail & Related papers (2024-05-18T01:49:16Z)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z)
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z)
Integration of Reinforcement Learning Based Behavior Planning With Sampling Based Motion Planning for Automated Driving [0.5801044612920815]
We propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning. To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner.
arXiv Detail & Related papers (2023-04-17T13:49:55Z)
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior [135.78858513845233]
STRIVE is a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions. To maintain scenario plausibility, the key idea is to leverage a learned model of traffic motion in the form of a graph-based conditional VAE. A subsequent optimization is used to find a "solution" to the scenario, ensuring it is useful to improve the given planner.
arXiv Detail & Related papers (2021-12-09T18:03:27Z)
Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world. Current learning approaches for visual prediction and planning fail on long-horizon tasks. We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
PiP: Planning-informed Trajectory Prediction for Autonomous Driving [69.41885900996589]
We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting. By informing the prediction process with the planning of ego vehicle, our method achieves the state-of-the-art performance of multi-agent forecasting on highway datasets.
arXiv Detail & Related papers (2020-03-25T16:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.