Related papers: Enhancing End-to-End Autonomous Driving with Latent World Model

Related papers

World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model [18.56171397212777]
We present World4Drive, an end-to-end autonomous driving framework that employs vision foundation models to build latent world models.<n>World4Drive achieves state-of-the-art performance without manual perception annotations on open-loop nuScenes and closed-loop NavSim benchmarks.
arXiv Detail & Related papers (2025-07-01T09:36:38Z)
Tracking Meets Large Multimodal Models for Driving Scenario Understanding [76.71815464110153]
Large Multimodal Models (LMMs) have recently gained prominence in autonomous driving research. We propose to integrate tracking information as an additional input to recover 3D spatial and temporal details. We introduce a novel approach for embedding this tracking information into LMMs to enhance their understanding of driving scenarios.
arXiv Detail & Related papers (2025-03-18T17:59:12Z)
DiFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving [55.53171248839489]
We propose an ego-centric fully sparse paradigm, named DiFSD, for end-to-end self-driving. Specifically, DiFSD mainly consists of sparse perception, hierarchical interaction and iterative motion planner. Experiments conducted on nuScenes and Bench2Drive datasets demonstrate the superior planning performance and great efficiency of DiFSD.
arXiv Detail & Related papers (2024-09-15T15:55:24Z)
UnO: Unsupervised Occupancy Fields for Perception and Forecasting [33.205064287409094]
Supervised approaches leverage annotated object labels to learn a model of the world. We learn to perceive and forecast a continuous 4D occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to tasks.
arXiv Detail & Related papers (2024-06-12T23:22:23Z)
DriveLM: Driving with Graph Visual Question Answering [57.51930417790141]
We study how vision-language models (VLMs) trained on web-scale data can be integrated into end-to-end driving systems. We propose a VLM-based baseline approach (DriveLM-Agent) for jointly performing Graph VQA and end-to-end driving.
arXiv Detail & Related papers (2023-12-21T18:59:12Z)
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? [84.17711168595311]
End-to-end autonomous driving has emerged as a promising research direction to target autonomy from a full-stack perspective. nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models. We introduce a new metric to evaluate whether the predicted trajectories adhere to the road.
arXiv Detail & Related papers (2023-12-05T11:32:31Z)
Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving [68.95178518732965]
A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory of the detected objects, or predict dense occupancy and flow grids for the whole scene. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network.
arXiv Detail & Related papers (2023-08-02T23:39:24Z)
End-to-end Autonomous Driving: Challenges and Frontiers [45.391430626264764]
We provide a comprehensive analysis of more than 270 papers, covering the motivation, roadmap, methodology, challenges, and future trends in end-to-end autonomous driving. We delve into several critical challenges, including multi-modality, interpretability, causal confusion, robustness, and world models, amongst others. We discuss current advancements in foundation models and visual pre-training, as well as how to incorporate these techniques within the end-to-end driving framework.
arXiv Detail & Related papers (2023-06-29T14:17:24Z)
Rethinking the Open-Loop Evaluation of End-to-End Autonomous Driving in nuScenes [38.43491956142818]
Planning task involves predicting the trajectory of the ego vehicle based on inputs from both internal intention and the external environment. Most existing works evaluate their performance on the nuScenes dataset using the L2 error and collision rate between the predicted trajectories and the ground truth. In this paper, we reevaluate these existing evaluation metrics and explore whether they accurately measure the superiority of different methods. Our simple method achieves similar end-to-end planning performance on the nuScenes dataset with other perception-based methods, reducing the average L2 error by about 20%.
arXiv Detail & Related papers (2023-05-17T17:59:11Z)
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding [51.8579160500354]
We propose an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. Results show equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2023-03-17T00:28:33Z)
Policy Pre-training for End-to-end Autonomous Driving via Self-supervised Geometric Modeling [96.31941517446859]
We propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only.
arXiv Detail & Related papers (2023-01-03T08:52:49Z)
Driving in Real Life with Inverse Reinforcement Learning [4.366642479205039]
We introduce the first learning-based planner to drive a car in dense, urban traffic using Inverse Reinforcement Learning (IRL) DriveIRL generates a diverse set of trajectory proposals, filters these with a lightweight and interpretable safety filter, and then uses a learned model to score each remaining trajectory. We validated DriveIRL on the Las Vegas Strip and demonstrated fully autonomous driving in heavy traffic.
arXiv Detail & Related papers (2022-06-07T04:36:46Z)
Fully End-to-end Autonomous Driving with Semantic Depth Cloud Mapping and Multi-Agent [2.512827436728378]
We propose a novel deep learning model trained with end-to-end and multi-task learning manners to perform both perception and control tasks simultaneously. The model is evaluated on CARLA simulator with various scenarios made of normal-adversarial situations and different weathers to mimic real-world conditions.
arXiv Detail & Related papers (2022-04-12T03:57:01Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs. We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z)
End-to-End Interactive Prediction and Planning with Optical Flow Distillation for Autonomous Driving [16.340715765227475]
We propose an end-to-end interactive neural motion planner (INMP) for autonomous driving in this paper. Our INMP first generates a feature map in bird's-eye-view space, which is then processed to detect other agents and perform interactive prediction and planning jointly. Also, we adopt an optical flow distillation paradigm, which can effectively improve the network performance while still maintaining its real-time inference speed.
arXiv Detail & Related papers (2021-04-18T14:05:18Z)
The Importance of Prior Knowledge in Precise Multimodal Prediction [71.74884391209955]
Roads have well defined geometries, topologies, and traffic rules. In this paper we propose to incorporate structured priors as a loss function. We demonstrate the effectiveness of our approach on real-world self-driving datasets.
arXiv Detail & Related papers (2020-06-04T03:56:11Z)
End-to-end Autonomous Driving Perception with Sequential Latent Representation Learning [34.61415516112297]
An end-to-end approach might clean up the system and avoid huge efforts of human engineering. A latent space is introduced to capture all relevant features useful for perception, which is learned through sequential latent representation learning. The learned end-to-end perception model is able to solve the detection, tracking, localization and mapping problems altogether with only minimum human engineering efforts.
arXiv Detail & Related papers (2020-03-21T05:37:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.