Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance
- URL: http://arxiv.org/abs/2504.12667v1
- Date: Thu, 17 Apr 2025 05:52:35 GMT
- Title: Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance
- Authors: Lin Liu, Ziying Song, Hongyu Pan, Lei Yang, Caiyan Jia,
- Abstract summary: Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules.<n>We propose TTOG, a novel two-stage trajectory generation framework.<n>In the first stage, a diverse set of trajectory candidates is generated, while the second stage focuses on refining these candidates through vehicle state information.<n>To mitigate the issue of unavailable surrounding vehicle states, TTOG employs a self-vehicle data-trained state estimator, subsequently extended to other vehicles.
- Score: 14.665143402317685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end autonomous driving has made impressive progress in recent years. Former end-to-end autonomous driving approaches often decouple planning and motion tasks, treating them as separate modules. This separation overlooks the potential benefits that planning can gain from learning out-of-distribution data encountered in motion tasks. However, unifying these tasks poses significant challenges, such as constructing shared contextual representations and handling the unobservability of other vehicles' states. To address these challenges, we propose TTOG, a novel two-stage trajectory generation framework. In the first stage, a diverse set of trajectory candidates is generated, while the second stage focuses on refining these candidates through vehicle state information. To mitigate the issue of unavailable surrounding vehicle states, TTOG employs a self-vehicle data-trained state estimator, subsequently extended to other vehicles. Furthermore, we introduce ECSA (equivariant context-sharing scene adapter) to enhance the generalization of scene representations across different agents. Experimental results demonstrate that TTOG achieves state-of-the-art performance across both planning and motion tasks. Notably, on the challenging open-loop nuScenes dataset, TTOG reduces the L2 distance by 36.06\%. Furthermore, on the closed-loop Bench2Drive dataset, our approach achieves a 22\% improvement in the driving score (DS), significantly outperforming existing baselines.
Related papers
- ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving [35.493857028919685]
We propose ReCogDrive, an autonomous driving system that integrates Vision-Language Models with diffusion planner.<n>In this paper, we use a large-scale driving question-answering datasets to train the VLMs, mitigating the domain discrepancy between generic content and real-world driving scenarios.<n>In the second stage, we employ a diffusion-based planner to perform imitation learning, mapping representations from the latent language space to continuous driving actions.
arXiv Detail & Related papers (2025-06-09T03:14:04Z) - ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation [44.16465715911478]
We propose ORION, a holistic E2E autonomous driving framework by vision-language instructed action generation.
Our method achieves an impressive closed-loop performance of 77.74 Driving Score (DS) and 54.62% Success Rate (SR) on the challenge Bench2Drive datasets.
arXiv Detail & Related papers (2025-03-25T15:18:43Z) - DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving [62.62464518137153]
DriveTransformer is a simplified E2E-AD framework for the ease of scaling up.<n>It is composed of three unified operations: task self-attention, sensor cross-attention, temporal cross-attention.<n>It achieves state-of-the-art performance in both simulated closed-loop benchmark Bench2Drive and real world open-loop benchmark nuScenes with high FPS.
arXiv Detail & Related papers (2025-03-07T11:41:18Z) - Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning [12.784479119173223]
Vehicle crowdsensing (VCS) has emerged as a key enabler, leveraging vehicles' mobility and sensor-equipped capabilities.<n>This work explores a promising scenario, where edge-assisted vehicles perform joint tasks of order serving and foundation model finetuning.
arXiv Detail & Related papers (2025-02-06T07:23:40Z) - DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving.<n>Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner.<n>Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z) - STT: Stateful Tracking with Transformers for Autonomous Driving [48.621552393062686]
Tracking objects in three-dimensional space is critical for autonomous driving.
To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present.
We propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately.
arXiv Detail & Related papers (2024-04-30T23:04:36Z) - DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving [81.04174379726251]
This paper collects a comprehensive end-to-end driving dataset named DriveCoT.
It contains sensor data, control decisions, and chain-of-thought labels to indicate the reasoning process.
We propose a baseline model called DriveCoT-Agent, trained on our dataset, to generate chain-of-thought predictions and final decisions.
arXiv Detail & Related papers (2024-03-25T17:59:01Z) - Pioneering SE(2)-Equivariant Trajectory Planning for Automated Driving [45.18582668677648]
Planning the trajectory of the controlled ego vehicle is a key challenge in automated driving.
We propose a lightweight equivariant planning model that generates multi-modal joint predictions for all vehicles.
We also propose equivariant route attraction to guide the ego vehicle along a high-level route provided by an off-the-shelf GPS navigation system.
arXiv Detail & Related papers (2024-03-17T18:53:46Z) - Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction [69.29802752614677]
RouteFormer is a novel ego-trajectory prediction network combining GPS data, environmental context, and the driver's field-of-view.<n>To tackle data scarcity and enhance diversity, we introduce GEM, a dataset of urban driving scenarios enriched with synchronized driver field-of-view and gaze data.
arXiv Detail & Related papers (2023-12-13T23:06:30Z) - Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? [84.17711168595311]
End-to-end autonomous driving has emerged as a promising research direction to target autonomy from a full-stack perspective.
nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models.
We introduce a new metric to evaluate whether the predicted trajectories adhere to the road.
arXiv Detail & Related papers (2023-12-05T11:32:31Z) - End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving.
We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others.
We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z) - TOFG: A Unified and Fine-Grained Environment Representation in
Autonomous Driving [7.787762537147956]
In autonomous driving, an accurate understanding of environment plays a critical role in many driving tasks such as trajectory prediction and motion planning.
Many data-driven models for trajectory prediction and motion planning extract vehicle-to-vehicle and vehicle-to-lane interactions in a separate and sequential manner.
We propose an environment representation, Temporal Occupancy Flow Graph (TOFG), which unifies the map information and vehicle trajectories into a homogeneous data format.
arXiv Detail & Related papers (2023-05-31T17:43:56Z) - Policy Pre-training for End-to-end Autonomous Driving via
Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving.
We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos.
In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input.
In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z) - Vision-Guided Forecasting -- Visual Context for Multi-Horizon Time
Series Forecasting [0.6947442090579469]
We tackle multi-horizon forecasting of vehicle states by fusing the two modalities.
We design and experiment with 3D convolutions for visual features extraction and 1D convolutions for features extraction from speed and steering angle traces.
We show that we are able to forecast a vehicle's state to various horizons, while outperforming the current state-of-the-art results on the related task of driving state estimation.
arXiv Detail & Related papers (2021-07-27T08:52:40Z) - Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction [71.97877759413272]
Trajectory prediction is a safety-critical tool for autonomous vehicles to plan and execute actions.
Recent methods have achieved strong performances using Multi-Choice Learning objectives like winner-takes-all (WTA) or best-of-many.
Our work addresses two key challenges in trajectory prediction, learning outputs, and better predictions by imposing constraints using driving knowledge.
arXiv Detail & Related papers (2021-04-16T17:58:56Z) - Deep Structured Reactive Planning [94.92994828905984]
We propose a novel data-driven, reactive planning objective for self-driving vehicles.
We show that our model outperforms a non-reactive variant in successfully completing highly complex maneuvers.
arXiv Detail & Related papers (2021-01-18T01:43:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.