Spatial Retrieval Augmented Autonomous Driving
- URL: http://arxiv.org/abs/2512.06865v1
- Date: Sun, 07 Dec 2025 14:40:49 GMT
- Title: Spatial Retrieval Augmented Autonomous Driving
- Authors: Xiaosong Jia, Chenhe Zhang, Yule Jiang, Songbur Wong, Zhiyuan Zhang, Chen Chen, Shaofeng Zhang, Xuanhe Zhou, Xue Yang, Junchi Yan, Yu-Gang Jiang,
- Abstract summary: Existing autonomous driving systems rely on onboard sensors for environmental perception.<n>We propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input.<n>We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.
- Score: 81.39665750557526
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks. For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.
Related papers
- Learning to Drive is a Free Gift: Large-Scale Label-Free Autonomy Pretraining from Unposed In-The-Wild Videos [20.73513310337503]
Ego-centric driving videos available online provide an abundant source of visual data for autonomous driving.<n>We propose a label-free, teacher-guided framework for learning autonomous driving representations directly from unposed videos.
arXiv Detail & Related papers (2026-02-25T16:38:53Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene [56.73568220959019]
Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial.<n>We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene.<n>We present the very first solution, using a combination of simulated collaborative data and real ego-car data.
arXiv Detail & Related papers (2025-02-10T17:07:53Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving
with Long-Range Perception [0.0]
This dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view.
The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain.
We trained unimodal and multimodal baseline models for 3D object detection.
arXiv Detail & Related papers (2022-11-17T10:19:59Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario.
The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available.
We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z) - The NEOLIX Open Dataset for Autonomous Driving [1.4091801425319965]
We present the NEOLIX dataset and its applica-tions in the autonomous driving area.
Our dataset includes about 30,000 frames with point cloud la-bels, and more than 600k 3D bounding boxes withannotations.
arXiv Detail & Related papers (2020-11-27T02:27:39Z) - SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving [27.948417322786575]
We present a simple yet effective approach to generate realistic scenario sensor data.
Our approach uses texture-mapped surfels to efficiently reconstruct the scene from an initial vehicle pass or set of passes.
We then leverage a SurfelGAN network to reconstruct realistic camera images for novel positions and orientations of the self-driving vehicle.
arXiv Detail & Related papers (2020-05-08T04:01:14Z) - PLOP: Probabilistic poLynomial Objects trajectory Planning for
autonomous driving [8.105493956485583]
We use a conditional imitation learning algorithm to predict trajectories for ego vehicle and its neighbors.
Our approach is computationally efficient and relies only on on-board sensors.
We evaluate our method offline on the publicly available dataset nuScenes.
arXiv Detail & Related papers (2020-03-09T16:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.