Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
- URL: http://arxiv.org/abs/2404.03421v1
- Date: Thu, 4 Apr 2024 12:58:46 GMT
- Title: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
- Authors: Andreea Dogaru, Mert Özer, Bernhard Egger,
- Abstract summary: Single-view 3D reconstruction is currently approached from two dominant perspectives.
We propose a hybrid method following a divide-and-conquer strategy.
We first process the scene holistically, extracting depth and semantic information.
We then leverage a single-shot object-level method for the detailed reconstruction of individual components.
- Score: 5.222115919729418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view 3D reconstruction is currently approached from two dominant perspectives: reconstruction of scenes with limited diversity using 3D data supervision or reconstruction of diverse singular objects using large image priors. However, real-world scenarios are far more complex and exceed the capabilities of these methods. We therefore propose a hybrid method following a divide-and-conquer strategy. We first process the scene holistically, extracting depth and semantic information, and then leverage a single-shot object-level method for the detailed reconstruction of individual components. By following a compositional processing approach, the overall framework achieves full reconstruction of complex 3D scenes from a single image. We purposely design our pipeline to be highly modular by carefully integrating specific procedures for each processing step, without requiring an end-to-end training of the whole system. This enables the pipeline to naturally improve as future methods can replace the individual modules. We demonstrate the reconstruction performance of our approach on both synthetic and real-world scenes, comparing favorable against prior works. Project page: https://andreeadogaru.github.io/Gen3DSR.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment [23.733856513456]
We present REPARO, a novel approach for compositional 3D asset generation from single images.
REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models.
It then optimize the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition.
arXiv Detail & Related papers (2024-05-28T18:45:10Z) - Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image.
We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space.
A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z) - Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction.
Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition.
We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z) - 3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface [8.824340350342512]
3DFIRES is a novel system for scene-level 3D reconstruction from posed images.
We show it matches the efficacy of single-view reconstruction methods with only one input.
arXiv Detail & Related papers (2024-03-13T17:59:50Z) - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using
Differentiable Rendering [54.35405028643051]
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone.
Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps.
We adopt the neural implicit surface reconstruction method, which allows for high-quality mesh.
arXiv Detail & Related papers (2023-03-27T10:07:52Z) - VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View
Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion.
We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z) - Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies
from Single RGB Images [5.775625085664381]
We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime.
Key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously.
We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-22T04:26:11Z) - Reconstructing Small 3D Objects in front of a Textured Background [0.0]
We present a technique for a complete 3D reconstruction of small objects moving in front of a textured background.
It is a particular variation of multibody structure from motion, which specializes to two objects only.
In experiments with real artifacts, we show that our approach has practical advantages when reconstructing 3D objects from all sides.
arXiv Detail & Related papers (2021-05-24T15:36:33Z) - A Divide et Impera Approach for 3D Shape Reconstruction from Multiple
Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.
This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views.
To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.