Related papers: ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models

ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models

URL: http://arxiv.org/abs/2603.00492v1
Date: Sat, 28 Feb 2026 06:22:40 GMT
Title: ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models
Authors: Riccardo de Lutio, Tobias Fischer, Yen-Yu Chang, Yuxuan Zhang, Jay Zhangjie Wu, Xuanchi Ren, Tianchang Shen, Katarina Tothova, Zan Gojcic, Haithem Turki,
Abstract summary: Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas.<n>We propose a two-stage pipeline that leverages two key insights.<n>First, we train a powerful bidirectional generative model with a novel opacity mixing strategy.<n>Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass.
Score: 27.324967736816337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifacts in these areas hold promise but currently suffer from two shortcomings. The first is scalability, as existing methods use image diffusion models or bidirectional video models that are limited in the number of views they can generate in a single pass (and thus require a costly iterative distillation process for consistency). The second is quality itself, as generators used in prior work tend to produce outputs that are inconsistent with existing scene content and fail entirely in completely unobserved regions. To solve these, we propose a two-stage pipeline that leverages two key insights. First, we train a powerful bidirectional generative model with a novel opacity mixing strategy that encourages consistency with existing observations while retaining the model's ability to extrapolate novel content in unseen areas. Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass. This model can directly produce novel views or serve as pseudo-supervision to improve the underlying 3D representation in a simple and highly efficient manner. We evaluate our method extensively and demonstrate that it can generate plausible reconstructions in scenarios where existing approaches fail completely. When measured on commonly benchmarked datasets, we outperform existing all existing baselines by a wide margin, exceeding prior state-of-the-art methods by 1-3 dB PSNR.

Related papers

BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model [3.7515646463759698]
We present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos.<n>BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone.<n>We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2026-02-26T03:58:42Z)
G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior [53.762256749551284]
We identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction.<n>We incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models.<n>Our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2025-10-14T03:06:28Z)
OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting [78.70702961852119]
OracleGS reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting.<n>Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions.
arXiv Detail & Related papers (2025-09-27T11:19:32Z)
Taming generative video models for zero-shot optical flow extraction [28.176290134216995]
Self-supervised video models trained only for future frame prediction can be prompted, without fine-tuning, to output flow.<n>Inspired by the Counterfactual World Model (CWM) paradigm, we extend this idea to generative video models.<n> KL-tracing is a novel test-time procedure that injects a localized perturbation into the first frame, rolls out the model one step, and computes the Kullback-Leibler divergence between perturbed and ungenerative predictive distributions.
arXiv Detail & Related papers (2025-07-11T23:59:38Z)
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors [13.883695200241524]
RI3D is a novel approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images.<n>Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions.<n>We produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes.
arXiv Detail & Related papers (2025-03-13T20:16:58Z)
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images [49.7344030427291]
We study the problem of single-image 3D object reconstruction.<n>Recent works have diverged into two directions: regression-based modeling and generative modeling.<n>We present SPAR3D, a novel two-stage approach aiming to take the best of both directions.
arXiv Detail & Related papers (2025-01-08T18:52:03Z)
DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models [67.50989119438508]
We introduce DSplats, a novel method that directly denoises multiview images using Gaussian-based Reconstructors to produce realistic 3D assets.<n>Our experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction.
arXiv Detail & Related papers (2024-12-11T07:32:17Z)
GECO: Generative Image-to-3D within a SECOnd [51.20830808525894]
We introduce GECO, a novel method for high-quality 3D generative modeling that operates within a second. GECO achieves high-quality image-to-3D mesh generation with an unprecedented level of efficiency.
arXiv Detail & Related papers (2024-05-30T17:58:00Z)
Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume. In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z)
Bridging Implicit and Explicit Geometric Transformation for Single-Image View Synthesis [16.14528024065244]
"seesaw" problem: preserving reprojected contents and completing realistic out-of-view regions. We propose a single-image view synthesis framework for mitigating the seesaw problem while utilizing an efficient non-autoregressive model. Our loss function promotes synthesizing that explicit features improve the reprojected area of implicit features and implicit features improve the out-of-view area of explicit features.
arXiv Detail & Related papers (2022-09-15T07:35:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.