DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation
- URL: http://arxiv.org/abs/2512.23983v1
- Date: Tue, 30 Dec 2025 04:41:56 GMT
- Title: DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation
- Authors: Yuang Jia, Jinlong Wang, Jiayi Zhao, Chunlam Li, Shunzhou Wang, Wei Gao,
- Abstract summary: This paper presents an effective solution for view extrapolation in autonomous driving scenarios.<n>Recent approaches focus on generating shifted novel view images from given viewpoints using diffusion models.<n>Compared with baselines, our method produces higher-quality images at novel extrapolated viewpoints.
- Score: 12.714087160353317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents an effective solution for view extrapolation in autonomous driving scenarios. Recent approaches focus on generating shifted novel view images from given viewpoints using diffusion models. However, these methods heavily rely on priors such as LiDAR point clouds, 3D bounding boxes, and lane annotations, which demand expensive sensors or labor-intensive labeling, limiting applicability in real-world deployment. In this work, with only images and optional camera poses, we first estimate a global static point cloud and per-frame dynamic point clouds, fusing them into a unified representation. We then employ a deformable 4D Gaussian framework to reconstruct the scene. The initially trained 4D Gaussian model renders degraded and pseudo-images to train a video diffusion model. Subsequently, progressively shifted Gaussian renderings are iteratively refined by the diffusion model,and the enhanced results are incorporated back as training data for 4DGS. This process continues until extrapolation reaches the target viewpoints. Compared with baselines, our method produces higher-quality images at novel extrapolated viewpoints.
Related papers
- LaVR: Scene Latent Conditioned Generative Video Trajectory Re-Rendering using Large 4D Reconstruction Models [52.656349227001925]
Given a monocular video, the goal of video re-rendering is to generate views of the scene from a novel camera trajectory.<n>Existing methods face two distinct challenges.<n>We propose to address these challenges by using the implicit geometric knowledge embedded in the latent space of a large 4D reconstruction model.
arXiv Detail & Related papers (2026-01-21T05:46:03Z) - Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models [79.06910348413861]
We introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image.<n>Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion.
arXiv Detail & Related papers (2025-11-01T11:16:25Z) - SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting [83.5106058182799]
We introduce SEE4D, a pose-free, trajectory-to-camera framework for 4D world modeling from casual videos.<n>A view-conditional video in model is trained to learn a robust geometry prior to denoising realistically synthesized images.<n>We validate See4D on cross-view video generation and sparse reconstruction benchmarks.
arXiv Detail & Related papers (2025-10-30T17:59:39Z) - WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving [21.778139777889397]
We propose WorldSplat, a novel feed-forward framework for 4D driving-scene generation.<n>Our approach effectively generates consistent multi-track videos through two key steps.<n>Experiments conducted on benchmark datasets demonstrate that WorldSplat effectively generates high-fidelity, temporally and spatially consistent novel view driving videos.
arXiv Detail & Related papers (2025-09-27T16:47:44Z) - Easi3R: Estimating Disentangled Motion from DUSt3R Without Training [69.51086319339662]
We introduce Easi3R, a simple yet efficient training-free method for 4D reconstruction.<n>Our approach applies attention adaptation during inference, eliminating the need for from-scratch pre-training or network fine-tuning.<n>Our experiments on real-world dynamic videos demonstrate that our lightweight attention adaptation significantly outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2025-03-31T17:59:58Z) - DrivingRecon: Large 4D Gaussian Reconstruction Model For Autonomous Driving [83.27075316161086]
Photorealistic 4D reconstruction of street scenes is essential for developing real-world simulators in autonomous driving.<n>We introduce the Large 4D Gaussian Reconstruction Model (DrivingRecon), a generalizable driving scene reconstruction model.<n>We show that DrivingRecon significantly improves scene reconstruction quality and novel view synthesis compared to existing methods.
arXiv Detail & Related papers (2024-12-12T08:10:31Z) - NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model [57.92709692193132]
NovelGS is a diffusion model for Gaussian Splatting given sparse-view images.
We leverage the novel view denoising through a transformer-based network to generate 3D Gaussians.
arXiv Detail & Related papers (2024-11-25T07:57:17Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.