Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
- URL: http://arxiv.org/abs/2405.16517v2
- Date: Sun, 2 Jun 2024 22:05:39 GMT
- Title: Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors
- Authors: Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen,
- Abstract summary: We tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM)
We present SparseSplat360, a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views.
Our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
- Score: 51.36238367193988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.
Related papers
- IM360: Textured Mesh Reconstruction for Large-scale Indoor Mapping with 360$^\circ$ Cameras [53.53895891356167]
We present a novel 3D reconstruction pipeline for 360$circ$ cameras for 3D mapping and rendering of indoor environments.
Our approach (IM360) leverages the wide field of view of omnidirectional images and integrates the spherical camera model into every core component of the SfM pipeline.
We evaluate our pipeline on large-scale indoor scenes from the Matterport3D and Stanford2D3D datasets.
arXiv Detail & Related papers (2025-02-18T05:15:19Z) - AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting [15.177483700681377]
Three-dimensional scene inpainting is crucial for applications from virtual reality to architectural visualization.
We present AuraFusion360, a novel reference-based method that enables high-quality object removal and hole filling in 3D scenes represented by Gaussian Splatting.
We also introduce 360-USID, the first comprehensive dataset for 360deg unbounded scene inpainting with ground truth.
arXiv Detail & Related papers (2025-02-07T18:59:55Z) - Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors [5.407319151576265]
We introduce a generative approach for pose-free reconstruction of $360circ$ scenes from a limited number of uncalibrated 2D images.
We propose an instruction-following RGBD diffusion model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene.
arXiv Detail & Related papers (2024-11-24T19:34:58Z) - SCube: Instant Large-Scale Scene Reconstruction using VoxSplats [55.383993296042526]
We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images.
Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold.
arXiv Detail & Related papers (2024-10-26T00:52:46Z) - LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors [34.91966359570867]
sparse-view reconstruction is inherently ill-posed and under-constrained.
We introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images.
Our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods.
arXiv Detail & Related papers (2024-09-05T12:09:02Z) - ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task.
The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition.
Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z) - AugGS: Self-augmented Gaussians with Structural Masks for Sparse-view 3D Reconstruction [9.953394373473621]
Sparse-view 3D reconstruction is a major challenge in computer vision.
We propose a self-augmented two-stage Gaussian splatting framework enhanced with structural masks for sparse-view 3D reconstruction.
Our approach achieves state-of-the-art performance in perceptual quality and multi-view consistency with sparse inputs.
arXiv Detail & Related papers (2024-08-09T03:09:22Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline.
Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement.
Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.