RAP: 3D Rasterization Augmented End-to-End Planning
- URL: http://arxiv.org/abs/2510.04333v1
- Date: Sun, 05 Oct 2025 19:31:24 GMT
- Title: RAP: 3D Rasterization Augmented End-to-End Planning
- Authors: Lan Feng, Yang Gao, Eloi Zablocki, Quanyi Li, Wuyang Li, Sichao Liu, Matthieu Cord, Alexandre Alahi,
- Abstract summary: Imitation learning for end-to-end driving trains policies only on expert demonstrations.<n>We propose 3D Rasterization, which replaces costly rendering with lightweightization of annotated primitives.<n>RAP achieves state-of-the-art closed-loop and long-tail robustness, ranking first on four major benchmarks.
- Score: 104.52778241744522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imitation learning for end-to-end driving trains policies only on expert demonstrations. Once deployed in a closed loop, such policies lack recovery data: small mistakes cannot be corrected and quickly compound into failures. A promising direction is to generate alternative viewpoints and trajectories beyond the logged path. Prior work explores photorealistic digital twins via neural rendering or game engines, but these methods are prohibitively slow and costly, and thus mainly used for evaluation. In this work, we argue that photorealism is unnecessary for training end-to-end planners. What matters is semantic fidelity and scalability: driving depends on geometry and dynamics, not textures or lighting. Motivated by this, we propose 3D Rasterization, which replaces costly rendering with lightweight rasterization of annotated primitives, enabling augmentations such as counterfactual recovery maneuvers and cross-agent view synthesis. To transfer these synthetic views effectively to real-world deployment, we introduce a Raster-to-Real feature-space alignment that bridges the sim-to-real gap. Together, these components form Rasterization Augmented Planning (RAP), a scalable data augmentation pipeline for planning. RAP achieves state-of-the-art closed-loop robustness and long-tail generalization, ranking first on four major benchmarks: NAVSIM v1/v2, Waymo Open Dataset Vision-based E2E Driving, and Bench2Drive. Our results show that lightweight rasterization with feature alignment suffices to scale E2E training, offering a practical alternative to photorealistic rendering. Project page: https://alan-lanfeng.github.io/RAP/.
Related papers
- EAG-PT: Emission-Aware Gaussians and Path Tracing for Indoor Scene Reconstruction and Editing [63.64931392651423]
We propose Emission-Aware Gaussians and Path Tracing (EAG-PT), aiming for physically based light transport with a unified 2D Gaussian representation.<n>Our design is based on three cores: (1) using 2D Gaussians as a unified scene representation and transport-friendly geometry proxy that avoids reconstructed mesh, (2) explicitly separating emissive and non-emissive components during reconstruction for further scene editing, and (3) decoupling reconstruction from final rendering by using efficient single-bounce optimization and high-quality multi-bounce path tracing after scene editing.
arXiv Detail & Related papers (2026-01-30T15:16:37Z) - A Generalizable Light Transport 3D Embedding for Global Illumination [30.088406137167997]
We propose a generalizable 3D light transport embedding that approximates global illumination directly from 3D scene configurations.<n>A scalable transformer models global point-to-point interactions to encode these features into neural primitives.<n>We demonstrate results on diffuse global illumination prediction across diverse indoor scenes with varying layouts, geometry, and materials.
arXiv Detail & Related papers (2025-10-21T00:29:09Z) - GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field [17.57215792490409]
GSFF-SLAM is a novel dense semantic SLAM system based on 3D Gaussian Splatting.<n>Our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals.<n>When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03% mIoU.
arXiv Detail & Related papers (2025-04-28T01:21:35Z) - Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.<n>Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.<n>Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - GS-IR: 3D Gaussian Splatting for Inverse Rendering [71.14234327414086]
We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS)
We extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions.
The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering.
arXiv Detail & Related papers (2023-11-26T02:35:09Z) - A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware
Image Synthesis [163.96778522283967]
We propose a shading-guided generative implicit model that is able to learn a starkly improved shape representation.
An accurate 3D shape should also yield a realistic rendering under different lighting conditions.
Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis.
arXiv Detail & Related papers (2021-10-29T10:53:12Z) - Recovering and Simulating Pedestrians in the Wild [81.38135735146015]
We propose to recover the shape and motion of pedestrians from sensor readings captured in the wild by a self-driving car driving around.
We incorporate the reconstructed pedestrian assets bank in a realistic 3D simulation system.
We show that the simulated LiDAR data can be used to significantly reduce the amount of real-world data required for visual perception tasks.
arXiv Detail & Related papers (2020-11-16T17:16:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.