FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis
- URL: http://arxiv.org/abs/2512.04830v1
- Date: Thu, 04 Dec 2025 14:14:21 GMT
- Title: FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis
- Authors: Shijie Chen, Peixi Peng,
- Abstract summary: FreeGen is a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene.<n>We show that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.
- Score: 24.76850299847588
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Closed-loop simulation and scalable pre-training for autonomous driving require synthesizing free-viewpoint driving scenes. However, existing datasets and generative pipelines rarely provide consistent off-trajectory observations, limiting large-scale evaluation and training. While recent generative models demonstrate strong visual realism, they struggle to jointly achieve interpolation consistency and extrapolation realism without per-scene optimization. To address this, we propose FreeGen, a feed-forward reconstruction-generation co-training framework for free-viewpoint driving scene synthesis. The reconstruction model provides stable geometric representations to ensure interpolation consistency, while the generation model performs geometry-aware enhancement to improve realism at unseen viewpoints. Through co-training, generative priors are distilled into the reconstruction model to improve off-trajectory rendering, and the refined geometry in turn offers stronger structural guidance for generation. Experiments demonstrate that FreeGen achieves state-of-the-art performance for free-viewpoint driving scene synthesis.
Related papers
- DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer [62.18680935878919]
We introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings into temporally consistent outputs.<n>At its core is a single-step temporally-conditioned enhancer capable of running in online simulators on a single GPU.
arXiv Detail & Related papers (2026-02-27T15:35:30Z) - DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving [49.11389494068169]
We present DrivingGen, the first comprehensive benchmark for generative driving world models.<n>DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources.<n>General models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality.
arXiv Detail & Related papers (2026-01-04T13:36:21Z) - SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration [37.202523124756034]
Current approaches often falter in large-angle novel view synthesis and suffer from geometric or lighting artifacts during asset manipulation.<n>We propose SymDrive, a unified diffusion-based framework capable of joint high-quality rendering and scene editing.<n>We demonstrate that SymDrive achieves photorealistic state-of-the-art performance in both novel-view enhancement and realistic 3D vehicle insertion.
arXiv Detail & Related papers (2025-12-25T10:28:43Z) - Optimization-Guided Diffusion for Interactive Scene Generation [52.23368750264419]
We present OMEGA, an optimization-guided, training-free framework that enforces structural consistency and interaction awareness during diffusion-based sampling.<n>We show that OMEGA improves generation realism, consistency, and controllability, increasing the ratio of physically and behaviorally valid scenes.<n>Our approach can also generate $5times$ more near-collision frames with a time-to-collision under three seconds.
arXiv Detail & Related papers (2025-12-08T15:56:18Z) - HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving [59.55918581964678]
HybridWorldSim is a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents.<n>We release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities.
arXiv Detail & Related papers (2025-11-27T07:53:16Z) - Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method [54.461213497603154]
Occupancy-centric methods have recently achieved state-of-the-art results by offering consistent conditioning across frames and modalities.<n>Nuplan-Occ is the largest occupancy dataset to date, constructed from the widely used Nuplan benchmark.<n>We develop a unified framework that jointly synthesizes high-quality occupancy, multi-view videos, and LiDAR point clouds.
arXiv Detail & Related papers (2025-10-27T03:52:45Z) - ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech.<n>The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture.<n>To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z) - Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.<n>Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.<n>We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z) - Driving View Synthesis on Free-form Trajectories with Generative Prior [39.24591650300784]
DriveX is a novel free-form driving view synthesis framework.<n>It distills generative prior into the 3D Gaussian model during its optimization.<n>It achieves high-quality view synthesis beyond recorded trajectories in real time.
arXiv Detail & Related papers (2024-12-02T17:07:53Z) - HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios [2.949710700293865]
HarmonicNeRF is a novel approach for outdoor self-supervised monocular scene reconstruction.
It capitalizes on the strengths of NeRF and enhances surface reconstruction accuracy by augmenting the input space with geometry-informed synthetic views.
Our approach establishes new benchmarks in synthesizing novel depth views and reconstructing scenes, significantly outperforming existing methods.
arXiv Detail & Related papers (2023-10-09T07:42:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.