FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
- URL: http://arxiv.org/abs/2508.06392v1
- Date: Fri, 08 Aug 2025 15:22:41 GMT
- Title: FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
- Authors: Wenbin Teng, Gonglin Chen, Haiwei Chen, Yajie Zhao,
- Abstract summary: We present FVGen, a framework that enables fast novel view synthesis using Video Diffusion Models (VDMs) in as few as four sampling steps.<n>Our framework generates the same number of novel views with similar (or even better) visual quality while reducing sampling time by more than 90%.
- Score: 7.731788894265875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in 3D reconstruction has enabled realistic 3D models from dense image captures, yet challenges persist with sparse views, often leading to artifacts in unseen areas. Recent works leverage Video Diffusion Models (VDMs) to generate dense observations, filling the gaps when only sparse views are available for 3D reconstruction tasks. A significant limitation of these methods is their slow sampling speed when using VDMs. In this paper, we present FVGen, a novel framework that addresses this challenge by enabling fast novel view synthesis using VDMs in as few as four sampling steps. We propose a novel video diffusion model distillation method that distills a multi-step denoising teacher model into a few-step denoising student model using Generative Adversarial Networks (GANs) and softened reverse KL-divergence minimization. Extensive experiments on real-world datasets show that, compared to previous works, our framework generates the same number of novel views with similar (or even better) visual quality while reducing sampling time by more than 90%. FVGen significantly improves time efficiency for downstream reconstruction tasks, particularly when working with sparse input views (more than 2) where pre-trained VDMs need to be run multiple times to achieve better spatial coverage.
Related papers
- BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model [3.7515646463759698]
We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos.<n>BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone.<n>We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2026-02-26T03:58:42Z) - Lightweight and Accurate Multi-View Stereo with Confidence-Aware Diffusion Model [81.01939699480094]
We propose a novel MVS framework, which introduces diffusion models in MVS.<n>Considering the discriminative characteristic of depth estimation, we design a condition encoder to guide the diffusion process.<n>Based on our novel MVS framework, we propose two novel MVS methods, DiffMVS and CasMVS.
arXiv Detail & Related papers (2025-09-18T17:59:19Z) - FlowR: Flowing from Sparse to Dense 3D Reconstructions [60.28571003356382]
We propose a flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions.<n>Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass.
arXiv Detail & Related papers (2025-04-02T11:57:01Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior [53.52396082006044]
Current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints.
This issue stems from the sparse training views captured by a fixed camera on a moving vehicle.
We propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model.
arXiv Detail & Related papers (2024-03-29T09:20:29Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.