Driving Scene Synthesis on Free-form Trajectories with Generative Prior
- URL: http://arxiv.org/abs/2412.01717v1
- Date: Mon, 02 Dec 2024 17:07:53 GMT
- Title: Driving Scene Synthesis on Free-form Trajectories with Generative Prior
- Authors: Zeyu Yang, Zijie Pan, Yuankun Yang, Xiatian Zhu, Li Zhang,
- Abstract summary: We propose a novel free-form driving view synthesis approach, dubbed DriveX.<n>Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory.<n>Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.
- Score: 39.24591650300784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driving scene synthesis along free-form trajectories is essential for driving simulations to enable closed-loop evaluation of end-to-end driving policies. While existing methods excel at novel view synthesis on recorded trajectories, they face challenges with novel trajectories due to limited views of driving videos and the vastness of driving environments. To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories. Concretely, we crafted an inverse problem that enables a video diffusion model to be utilized as a prior for many-trajectory optimization of a parametric 3D model (e.g., Gaussian splatting). To seamlessly use the generative prior, we iteratively conduct this process during optimization. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory, enabling free-form trajectory driving simulation. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.
Related papers
- VDEGaussian: Video Diffusion Enhanced 4D Gaussian Splatting for Dynamic Urban Scenes Modeling [68.65587507038539]
We present a novel video diffusion-enhanced 4D Gaussian Splatting framework for dynamic urban scene modeling.<n>Our key insight is to distill robust, temporally consistent priors from a test-time adapted video diffusion model.<n>Our method significantly enhances dynamic modeling, especially for fast-moving objects, achieving an approximate PSNR gain of 2 dB.
arXiv Detail & Related papers (2025-08-04T07:24:05Z) - Enhanced Velocity Field Modeling for Gaussian Video Reconstruction [21.54297055995746]
High-fidelity 3D video reconstruction is essential for enabling real-time rendering of dynamic scenes with realistic motion in virtual and augmented reality (VR/AR)<n>We propose a flow-empowered velocity field modeling scheme tailored for Gaussian video reconstruction, dubbed FlowGaussian-VR.<n>It consists of two core components: a velocity field rendering (VFR) pipeline which enables optical flow-based optimization, and a flow-assisted adaptive densification (FAD) strategy that adjusts the number and size of Gaussians in dynamic regions.
arXiv Detail & Related papers (2025-07-31T16:26:22Z) - ACT-R: Adaptive Camera Trajectories for Single View 3D Reconstruction [12.942796503696194]
We introduce the simple idea of adaptive view planning to multi-view synthesis.<n>We generate a sequence of views, leveraging temporal consistency to enhance 3D coherence.<n>Our method improves 3D reconstruction over SOTA alternatives on the unseen GSO dataset.
arXiv Detail & Related papers (2025-05-13T05:31:59Z) - MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction [44.592566642185425]
MuDG is an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction.<n>We show that MuDG outperforms existing methods in both reconstruction and photorealistic synthesis quality.
arXiv Detail & Related papers (2025-03-13T17:48:41Z) - Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation [1.0027737736304287]
We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering.<n>Our approach significantly enhances novel view synthesis quality, especially for road surfaces and lane markings.<n>We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud.
arXiv Detail & Related papers (2025-03-12T15:18:50Z) - DreamDrive: Generative 4D Scene Modeling from Street View Images [55.45852373799639]
We present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction.
Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references.
We then render 3D-consistent driving videos via Gaussian splatting.
arXiv Detail & Related papers (2024-12-31T18:59:57Z) - StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models [59.55232046525733]
We introduce StreetCrafter, a controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions.<n>In addition, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes.<n>Our model enables flexible control over viewpoint changes, enlarging the view for satisfying rendering regions.
arXiv Detail & Related papers (2024-12-17T18:58:55Z) - Stag-1: Towards Realistic 4D Driving Simulation with Video Generation Model [83.31688383891871]
We propose a Spatial-Temporal simulAtion for drivinG (Stag-1) model to reconstruct real-world scenes.
Stag-1 constructs continuous 4D point cloud scenes using surround-view data from autonomous vehicles.
It decouples spatial-temporal relationships and produces coherent driving videos.
arXiv Detail & Related papers (2024-12-06T18:59:56Z) - InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models [75.03495065452955]
We present InfiniCube, a scalable method for generating dynamic 3D driving scenes with high fidelity and controllability.
Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.
arXiv Detail & Related papers (2024-12-05T07:32:20Z) - From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events [5.132984904858975]
Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance.
We propose a novel framework that automates the conversion of real-world car crash videos into detailed simulation scenarios.
Our preliminary results demonstrate substantial time efficiency, finishing the real-to-sim conversion in minutes with full automation and no human intervention.
arXiv Detail & Related papers (2024-11-25T01:01:54Z) - FreeVS: Generative View Synthesis on Free Driving Trajectory [55.49370963413221]
FreeVS is a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes.
FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories.
arXiv Detail & Related papers (2024-10-23T17:59:11Z) - DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation [10.296670127024045]
DriveScape is an end-to-end framework for multi-view, 3D condition-guided video generation.
Our Bi-Directional Modulated Transformer (BiMot) ensures precise alignment of 3D structural information.
DriveScape excels in video generation performance, achieving state-of-the-art results on the nuScenes dataset with an FID score of 8.34 and an FVD score of 76.39.
arXiv Detail & Related papers (2024-09-09T09:43:17Z) - GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model [6.144680854063938]
GenDDS is a novel approach for generating driving scenarios for autonomous driving systems.
We employ the KITTI dataset, which includes real-world driving videos, to train the model.
We demonstrate that our model can generate high-quality driving videos that closely replicate the complexity and variability of real-world driving scenarios.
arXiv Detail & Related papers (2024-08-28T15:37:44Z) - AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [17.600027937450342]
AutoSplat is a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes.
Our method enables multi-view consistent simulation of challenging scenarios including lane changes.
arXiv Detail & Related papers (2024-07-02T18:36:50Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - TC4D: Trajectory-Conditioned Text-to-4D Generation [94.90700997568158]
We propose TC4D: trajectory-conditioned text-to-4D generation, which factors motion into global and local components.
We learn local deformations that conform to the global trajectory using supervision from a text-to-video model.
Our approach enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements to the realism and amount of generated motion.
arXiv Detail & Related papers (2024-03-26T17:55:11Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory
Diffusion [83.88829943619656]
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals.
Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups.
We propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios.
arXiv Detail & Related papers (2023-04-04T15:46:42Z) - Path Planning Followed by Kinodynamic Smoothing for Multirotor Aerial
Vehicles (MAVs) [61.94975011711275]
We propose a geometrically based motion planning technique textquotedblleft RRT*textquotedblright; for this purpose.
In the proposed technique, we modified original RRT* introducing an adaptive search space and a steering function.
We have tested the proposed technique in various simulated environments.
arXiv Detail & Related papers (2020-08-29T09:55:49Z) - LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World [84.57894492587053]
We develop a novel simulator that captures both the power of physics-based and learning-based simulation.
We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation.
We showcase LiDARsim's usefulness for perception algorithms-testing on long-tail events and end-to-end closed-loop evaluation on safety-critical scenarios.
arXiv Detail & Related papers (2020-06-16T17:44:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.