High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion
- URL: http://arxiv.org/abs/2502.12752v1
- Date: Tue, 18 Feb 2025 11:13:06 GMT
- Title: High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion
- Authors: Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers,
- Abstract summary: We introduce SplatDiff, a pixel-splatting-guided video diffusion model designed to synthesize high-fidelity novel views from a single image.
To mitigate texture hallucination, we design a texture bridge module that enables high-fidelity texture generation through adaptive feature fusion.
In experiments, SplatDiff shows remarkable zero-shot performance across diverse tasks, including sparse-view NVS and stereo video conversion.
- Score: 15.244909728255417
- License:
- Abstract: Despite recent advances in Novel View Synthesis (NVS), generating high-fidelity views from single or sparse observations remains a significant challenge. Existing splatting-based approaches often produce distorted geometry due to splatting errors. While diffusion-based methods leverage rich 3D priors to achieve improved geometry, they often suffer from texture hallucination. In this paper, we introduce SplatDiff, a pixel-splatting-guided video diffusion model designed to synthesize high-fidelity novel views from a single image. Specifically, we propose an aligned synthesis strategy for precise control of target viewpoints and geometry-consistent view synthesis. To mitigate texture hallucination, we design a texture bridge module that enables high-fidelity texture generation through adaptive feature fusion. In this manner, SplatDiff leverages the strengths of splatting and diffusion to generate novel views with consistent geometry and high-fidelity details. Extensive experiments verify the state-of-the-art performance of SplatDiff in single-view NVS. Additionally, without extra training, SplatDiff shows remarkable zero-shot performance across diverse tasks, including sparse-view NVS and stereo video conversion.
Related papers
- Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis [25.924727931514735]
Generalizable 3DGS can reconstruct new scenes from sparse-view observations in a feed-forward inference manner.
Existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes.
We propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis.
arXiv Detail & Related papers (2024-10-30T08:51:29Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes [50.534213038479926]
FreeSplat is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.
We propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views.
arXiv Detail & Related papers (2024-05-28T08:40:14Z) - Consistent123: Improve Consistency for One Image to 3D Object Synthesis [74.1094516222327]
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability.
These models have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation.
We propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism.
arXiv Detail & Related papers (2023-10-12T07:38:28Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - ProbNVS: Fast Novel View Synthesis with Learned Probability-Guided
Sampling [42.37704606186928]
We propose to build a novel view synthesis framework based on learned MVS priors.
We show that our method achieves 15 to 40 times faster rendering compared to state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-07T14:45:42Z) - Stable View Synthesis [100.86844680362196]
We present Stable View Synthesis (SVS)
Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene.
SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets.
arXiv Detail & Related papers (2020-11-14T07:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.