GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction
- URL: http://arxiv.org/abs/2512.25073v1
- Date: Wed, 31 Dec 2025 18:59:55 GMT
- Title: GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction
- Authors: Yi-Chuan Huang, Hao-Jen Chien, Chin-Yang Lin, Ying-Huan Chen, Yu-Lun Liu,
- Abstract summary: GaMO is a framework that reformulates sparse-view reconstruction through multi-view outpainting.<n>Our approach employs multi-view conditioning and geometry-aware denoising strategies in a zero-shot manner without training.
- Score: 6.362401262063673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approaches, including regularization techniques, semantic priors, and geometric constraints, have been implemented to address this challenge. Latest diffusion-based methods have demonstrated substantial improvements by generating novel views from new camera poses to augment training data, surpassing earlier regularization and prior-based techniques. Despite this progress, we identify three critical limitations in these state-of-the-art approaches: inadequate coverage beyond known view peripheries, geometric inconsistencies across generated views, and computationally expensive pipelines. We introduce GaMO (Geometry-aware Multi-view Outpainter), a framework that reformulates sparse-view reconstruction through multi-view outpainting. Instead of generating new viewpoints, GaMO expands the field of view from existing camera poses, which inherently preserves geometric consistency while providing broader scene coverage. Our approach employs multi-view conditioning and geometry-aware denoising strategies in a zero-shot manner without training. Extensive experiments on Replica and ScanNet++ demonstrate state-of-the-art reconstruction quality across 3, 6, and 9 input views, outperforming prior methods in PSNR and LPIPS, while achieving a $25\times$ speedup over SOTA diffusion-based methods with processing time under 10 minutes. Project page: https://yichuanh.github.io/GaMO/
Related papers
- AnchoredDream: Zero-Shot 360° Indoor Scene Generation from a Single View via Geometric Grounding [58.90269958632018]
Single-view indoor scene generation plays a crucial role in a range of real-world applications.<n>Recent approaches have made progress by leveraging diffusion models and depth estimation networks.<n>We propose AnchoredDream, a novel zero-shot pipeline that anchors 360 scene generation on high-fidelity geometry.
arXiv Detail & Related papers (2026-01-23T08:08:12Z) - G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior [53.762256749551284]
We identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction.<n>We incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models.<n>Our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios.
arXiv Detail & Related papers (2025-10-14T03:06:28Z) - ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors [37.455535904703204]
We propose a 3DGS-based pipeline that generates additional training views to enhance reconstruction.<n>Fine-tuning 3D Gaussians with these enhanced views significantly improves reconstruction quality.<n> Experiments demonstrate that our approach outperforms existing 3DGS-based methods.
arXiv Detail & Related papers (2025-08-08T05:01:17Z) - Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object [55.93553895520324]
We propose a novel training-free approach that integrates local dense observations and multi-source priors for reconstruction.<n>Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views.
arXiv Detail & Related papers (2025-05-29T03:51:37Z) - ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image [4.366356163044466]
Existing methods are often limited to reconstruct low-consistency 3D scenes with narrow fields of view from single-view input.<n>We propose ExScene, a two-stage pipeline to reconstruct an immersive 3D scene from any given single-view image.<n>ExScene achieves consistent and immersive scene reconstruction using only single-view input.
arXiv Detail & Related papers (2025-03-31T09:33:22Z) - Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.<n>Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.<n>We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion [27.836518920611557]
We introduce MVGD, a diffusion-based architecture capable of direct pixel-level generation of images and depth maps from novel viewpoints.<n>We train this model on a collection of more than 60 million multi-view samples from publicly available datasets.<n>We report state-of-the-art results in multiple novel view synthesis benchmarks, as well as multi-view stereo and video depth estimation.
arXiv Detail & Related papers (2025-01-30T23:43:06Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Solving Inverse Problems with NerfGANs [88.24518907451868]
We introduce a novel framework for solving inverse problems using NeRF-style generative models.
We show that naively optimizing the latent space leads to artifacts and poor novel view rendering.
We propose a novel radiance field regularization method to obtain better 3-D surfaces and improved novel views given single view observations.
arXiv Detail & Related papers (2021-12-16T17:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.