Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
- URL: http://arxiv.org/abs/2505.17659v2
- Date: Tue, 27 May 2025 14:51:12 GMT
- Title: Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling
- Authors: Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen,
- Abstract summary: Plan-R1 is a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task.<n>In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data.<n>In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization.
- Score: 75.83583076519311
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance. Our code will be made public soon.
Related papers
- Diffusion-Based Planning for Autonomous Driving with Flexible Guidance [19.204115959760788]
We propose a novel transformer-based Diffusion Planner for closed-loop planning.<n>Our model supports joint modeling of both prediction and planning tasks.<n>It achieves state-of-the-art closed-loop performance with robust transferability in diverse driving styles.
arXiv Detail & Related papers (2025-01-26T15:49:50Z) - LHPF: Look back the History and Plan for the Future in Autonomous Driving [10.855426442780516]
This paper introduces LHPF, an imitation learning planner that integrates historical planning information.
Our approach employs a historical intention aggregation module that pools historical planning intentions.
Experiments using both real-world and synthetic data demonstrate that LHPF not only surpasses existing advanced learning-based planners in planning performance but also marks the first instance of a purely learning-based planner outperforming the expert.
arXiv Detail & Related papers (2024-11-26T09:30:26Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Integration of Reinforcement Learning Based Behavior Planning With
Sampling Based Motion Planning for Automated Driving [0.5801044612920815]
We propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning.
To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner.
arXiv Detail & Related papers (2023-04-17T13:49:55Z) - Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic
Prior [135.78858513845233]
STRIVE is a method to automatically generate challenging scenarios that cause a given planner to produce undesirable behavior, like collisions.
To maintain scenario plausibility, the key idea is to leverage a learned model of traffic motion in the form of a graph-based conditional VAE.
A subsequent optimization is used to find a "solution" to the scenario, ensuring it is useful to improve the given planner.
arXiv Detail & Related papers (2021-12-09T18:03:27Z) - Long-Horizon Visual Planning with Goal-Conditioned Hierarchical
Predictors [124.30562402952319]
The ability to predict and plan into the future is fundamental for agents acting in the world.
Current learning approaches for visual prediction and planning fail on long-horizon tasks.
We propose a framework for visual prediction and planning that is able to overcome both of these limitations.
arXiv Detail & Related papers (2020-06-23T17:58:56Z) - The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules.
In this paper we propose to incorporate structured priors as a loss function.
We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z) - PiP: Planning-informed Trajectory Prediction for Autonomous Driving [69.41885900996589]
We propose planning-informed trajectory prediction (PiP) to tackle the prediction problem in the multi-agent setting.
By informing the prediction process with the planning of ego vehicle, our method achieves the state-of-the-art performance of multi-agent forecasting on highway datasets.
arXiv Detail & Related papers (2020-03-25T16:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.