Related papers: BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

URL: http://arxiv.org/abs/2602.22596v1
Date: Thu, 26 Feb 2026 03:58:42 GMT
Title: BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model
Authors: Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, Alper Yilmaz,
Abstract summary: We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos.<n>BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone.<n>We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.
Score: 3.7515646463759698
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone, aiming to mitigate artifacts and recover view-consistent details at inference time. Conventional methods have developed similar diffusion-based solutions to address these challenges of novel view synthesis. Despite significant improvements, these methods typically rely on off-the-shelf pretrained diffusion priors and fine-tune only the UNet module while keeping other components frozen, which still leads to inconsistent details and artifacts even when incorporating geometry-aware regularizations like depth or semantic conditions. To address this, we investigate the latent space of the diffusion model and introduce two components: (1) temporal equivariance regularization and (2) vision foundation model-aligned representation, both applied to the variational autoencoder (VAE) module within the SVD pipeline. BetterScene integrates a feed-forward 3D Gaussian Splatting (3DGS) model to render features as inputs for the SVD enhancer and generate continuous, artifact-free, consistent novel views. We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.

Related papers

ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models [27.324967736816337]
Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas.<n>We propose a two-stage pipeline that leverages two key insights.<n>First, we train a powerful bidirectional generative model with a novel opacity mixing strategy.<n>Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass.
arXiv Detail & Related papers (2026-02-28T06:22:40Z)
One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion [57.824020826432815]
We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images.<n>We design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone.<n>We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process.
arXiv Detail & Related papers (2026-01-20T17:11:55Z)
Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting [21.952325954391508]
We introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images.<n>Our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.
arXiv Detail & Related papers (2025-10-11T08:13:46Z)
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z)
Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework [14.927184256861807]
We propose a novel SfM-free 3DGS-based method that jointly estimates camera poses and reconstructs 3D scenes from extremely sparse-view inputs.<n>Our method significantly outperforms other state-of-the-art 3DGS-based approaches, achieving a remarkable 2.75dB improvement in PSNR under extremely sparse-view conditions.
arXiv Detail & Related papers (2025-08-21T11:25:24Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
S^2Former-OR: Single-Stage Bi-Modal Transformer for Scene Graph Generation in OR [50.435592120607815]
Scene graph generation (SGG) of surgical procedures is crucial in enhancing holistically cognitive intelligence in the operating room (OR) Previous works have primarily relied on multi-stage learning, where the generated semantic scene graphs depend on intermediate processes with pose estimation and object detection. In this study, we introduce a novel single-stage bi-modal transformer framework for SGG in the OR, termed S2Former-OR.
arXiv Detail & Related papers (2024-02-22T11:40:49Z)
Sparse3D: Distilling Multiview-Consistent Diffusion for Object Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs. Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field. By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z)
Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.