Related papers: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View

URL: http://arxiv.org/abs/2404.03421v1
Date: Thu, 4 Apr 2024 12:58:46 GMT
Title: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View
Authors: Andreea Dogaru, Mert Özer, Bernhard Egger,
Abstract summary: Single-view 3D reconstruction is currently approached from two dominant perspectives. We propose a hybrid method following a divide-and-conquer strategy. We first process the scene holistically, extracting depth and semantic information. We then leverage a single-shot object-level method for the detailed reconstruction of individual components.
Score: 5.222115919729418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Single-view 3D reconstruction is currently approached from two dominant perspectives: reconstruction of scenes with limited diversity using 3D data supervision or reconstruction of diverse singular objects using large image priors. However, real-world scenarios are far more complex and exceed the capabilities of these methods. We therefore propose a hybrid method following a divide-and-conquer strategy. We first process the scene holistically, extracting depth and semantic information, and then leverage a single-shot object-level method for the detailed reconstruction of individual components. By following a compositional processing approach, the overall framework achieves full reconstruction of complex 3D scenes from a single image. We purposely design our pipeline to be highly modular by carefully integrating specific procedures for each processing step, without requiring an end-to-end training of the whole system. This enables the pipeline to naturally improve as future methods can replace the individual modules. We demonstrate the reconstruction performance of our approach on both synthetic and real-world scenes, comparing favorable against prior works. Project page: https://andreeadogaru.github.io/Gen3DSR.

Related papers

Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images [36.084665557986156]
Reconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision.<n>In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics.
arXiv Detail & Related papers (2025-08-05T16:54:55Z)
sshELF: Single-Shot Hierarchical Extrapolation of Latent Features for 3D Reconstruction from Sparse-Views [41.73382885439258]
Reconstructing outdoor scenes from outward-facing views poses significant challenges due to minimal view overlap. We propose a fast, single-shot pipeline for unbounded-view 3D scene reconstruction via hierarchal extrapolation. We find that latentELF faithfully reconstructs occluded regions, supports real-time rendering, and provides rich features for downstream applications.
arXiv Detail & Related papers (2025-02-06T18:58:45Z)
Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling [27.577720075952225]
We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each.
arXiv Detail & Related papers (2024-11-29T06:19:04Z)
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild [79.71523320368388]
Our work aims to reconstruct hand-object interactions from a single-view image. We first design a novel pipeline to estimate the underlying hand pose and object shape. With the initial reconstruction, we employ a prior-guided optimization scheme.
arXiv Detail & Related papers (2024-11-21T16:33:35Z)
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment [23.733856513456]
We present REPARO, a novel approach for compositional 3D asset generation from single images. REPARO employs a two-step process: first, it extracts individual objects from the scene and reconstructs their 3D meshes using off-the-shelf image-to-3D models. It then optimize the layout of these meshes through differentiable rendering techniques, ensuring coherent scene composition.
arXiv Detail & Related papers (2024-05-28T18:45:10Z)
Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image. We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z)
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction [51.3632308129838]
We present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing.
arXiv Detail & Related papers (2024-03-28T11:12:33Z)
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surface [8.824340350342512]
3DFIRES is a novel system for scene-level 3D reconstruction from posed images. We show it matches the efficacy of single-view reconstruction methods with only one input.
arXiv Detail & Related papers (2024-03-13T17:59:50Z)
Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement [49.888011242939385]
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects.
arXiv Detail & Related papers (2023-07-10T17:56:06Z)
TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering [54.35405028643051]
We present a new pipeline for acquiring a textured mesh in the wild with a single smartphone. Our method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps. We adopt the neural implicit surface reconstruction method, which allows for high-quality mesh.
arXiv Detail & Related papers (2023-03-27T10:07:52Z)
Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images. This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories. We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
VoRTX: Volumetric 3D Reconstruction With Transformers for Voxelwise View Selection and Fusion [68.68537312256144]
VoRTX is an end-to-end volumetric 3D reconstruction network using transformers for wide-baseline, multi-view feature fusion. We train our model on ScanNet and show that it produces better reconstructions than state-of-the-art methods.
arXiv Detail & Related papers (2021-12-01T02:18:11Z)
LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [45.16128577837725]
Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after encoding them. We propose LegoFormer, a transformer-based model that unifies object reconstruction under a single framework and parametrizes the reconstructed occupancy grid by its decomposition factors.
arXiv Detail & Related papers (2021-06-23T00:15:08Z)
Deep3DPose: Realtime Reconstruction of Arbitrarily Posed Human Bodies from Single RGB Images [5.775625085664381]
We introduce an approach that accurately reconstructs 3D human poses and detailed 3D full-body geometric models from single images in realtime. Key idea of our approach is a novel end-to-end multi-task deep learning framework that uses single images to predict five outputs simultaneously. We show the system advances the frontier of 3D human body and pose reconstruction from single images by quantitative evaluations and comparisons with state-of-the-art methods.
arXiv Detail & Related papers (2021-06-22T04:26:11Z)
Reconstructing Small 3D Objects in front of a Textured Background [0.0]
We present a technique for a complete 3D reconstruction of small objects moving in front of a textured background. It is a particular variation of multibody structure from motion, which specializes to two objects only. In experiments with real artifacts, we show that our approach has practical advantages when reconstructing 3D objects from all sides.
arXiv Detail & Related papers (2021-05-24T15:36:33Z)
Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem. We propose a new framework named 3D Volume Transformer (VolT) for such a task. Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z)
A Divide et Impera Approach for 3D Shape Reconstruction from Multiple Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning. This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views. To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.