StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
- URL: http://arxiv.org/abs/2412.13188v1
- Date: Tue, 17 Dec 2024 18:58:55 GMT
- Title: StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
- Authors: Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, Sida Peng,
- Abstract summary: We introduce StreetCrafter, a controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions.
In addition, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes.
Our model enables flexible control over viewpoint changes, enlarging the view for satisfying rendering regions.
- Score: 59.55232046525733
- License:
- Abstract: This paper aims to tackle the problem of photorealistic view synthesis from vehicle sensor data. Recent advancements in neural scene representation have achieved notable success in rendering high-quality autonomous driving scenes, but the performance significantly degrades as the viewpoint deviates from the training trajectory. To mitigate this problem, we introduce StreetCrafter, a novel controllable video diffusion model that utilizes LiDAR point cloud renderings as pixel-level conditions, which fully exploits the generative prior for novel view synthesis, while preserving precise camera control. Moreover, the utilization of pixel-level LiDAR conditions allows us to make accurate pixel-level edits to target scenes. In addition, the generative prior of StreetCrafter can be effectively incorporated into dynamic scene representations to achieve real-time rendering. Experiments on Waymo Open Dataset and PandaSet demonstrate that our model enables flexible control over viewpoint changes, enlarging the view synthesis regions for satisfying rendering, which outperforms existing methods.
Related papers
- FreeVS: Generative View Synthesis on Free Driving Trajectory [55.49370963413221]
FreeVS is a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes.
FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories.
arXiv Detail & Related papers (2024-10-23T17:59:11Z) - Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling [70.34875558830241]
We present a way for learning a-temporal (4D) embedding, based on semantic semantic gears to allow for stratified modeling of dynamic regions of rendering the scene.
At the same time, almost for free, our tracking approach enables free-viewpoint of interest - a functionality not yet achieved by existing NeRF-based methods.
arXiv Detail & Related papers (2024-06-06T03:37:39Z) - LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes [73.65115834242866]
Photorealistic simulation plays a crucial role in applications such as autonomous driving.
However, reconstruction quality suffers on street scenes due to collinear camera motions and sparser samplings at higher speeds.
We propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes.
arXiv Detail & Related papers (2024-05-01T23:07:12Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video [22.613946218766802]
We present UrbanIR, a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video.
It accurately infers shape, albedo, visibility, and sun and sky illumination from wide-baseline videos, such as those from car-mounted cameras.
UrbanIR addresses these issues with novel losses that reduce errors in inverse graphics inference and rendering artifacts.
arXiv Detail & Related papers (2023-06-15T17:59:59Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.