One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion
- URL: http://arxiv.org/abs/2601.14161v1
- Date: Tue, 20 Jan 2026 17:11:55 GMT
- Title: One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion
- Authors: Yitong Dong, Qi Zhang, Minchao Jiang, Zhiqiang Wu, Qingnan Fan, Ying Feng, Huaqi Zhang, Hujun Bao, Guofeng Zhang,
- Abstract summary: We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images.<n>We design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone.<n>We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process.
- Score: 57.824020826432815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions. To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.
Related papers
- BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model [3.7515646463759698]
We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos.<n>BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone.<n>We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2026-02-26T03:58:42Z) - Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting [21.952325954391508]
We introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images.<n>Our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.
arXiv Detail & Related papers (2025-10-11T08:13:46Z) - RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions [67.48495052903534]
We propose a general and efficient multi-view feature enhancement module, RobustGS.<n>It substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions.<n>The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner.
arXiv Detail & Related papers (2025-08-05T04:50:29Z) - Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z) - SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting [6.309174895120047]
3D Gaussian Splatting (3DGS) has excelled in novel view synthesis (NVS) with its real-time rendering capabilities and superior quality.<n>However, it encounters challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views.<n>We propose SuperGS, an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework.
arXiv Detail & Related papers (2025-05-24T11:33:57Z) - Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [22.767866875051013]
We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address limitations of current methods.<n>By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones.<n> Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.
arXiv Detail & Related papers (2025-04-02T17:59:46Z) - MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction [44.592566642185425]
MuDG is an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction.<n>We show that MuDG outperforms existing methods in both reconstruction and photorealistic synthesis quality.
arXiv Detail & Related papers (2025-03-13T17:48:41Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Anti-Aliased Neural Implicit Surfaces with Encoding Level of Detail [54.03399077258403]
We present LoD-NeuS, an efficient neural representation for high-frequency geometry detail recovery and anti-aliased novel view rendering.
Our representation aggregates space features from a multi-convolved featurization within a conical frustum along a ray.
arXiv Detail & Related papers (2023-09-19T05:44:00Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.