Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction
for Indoor Scenes from a Single Image
- URL: http://arxiv.org/abs/2002.12212v1
- Date: Thu, 27 Feb 2020 16:00:52 GMT
- Title: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction
for Indoor Scenes from a Single Image
- Authors: Yinyu Nie, Xiaoguang Han, Shihui Guo, Yujian Zheng, Jian Chang, Jian
Jun Zhang
- Abstract summary: We propose an end-to-end solution to jointly reconstruct room layout, object bounding boxes and meshes from a single image.
Our method builds upon a holistic scene context and proposes a coarse-to-fine hierarchy with three components.
Experiments on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently outperforms existing methods.
- Score: 24.99186733297264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic reconstruction of indoor scenes refers to both scene understanding
and object reconstruction. Existing works either address one part of this
problem or focus on independent objects. In this paper, we bridge the gap
between understanding and reconstruction, and propose an end-to-end solution to
jointly reconstruct room layout, object bounding boxes and meshes from a single
image. Instead of separately resolving scene understanding and object
reconstruction, our method builds upon a holistic scene context and proposes a
coarse-to-fine hierarchy with three components: 1. room layout with camera
pose; 2. 3D object bounding boxes; 3. object meshes. We argue that
understanding the context of each component can assist the task of parsing the
others, which enables joint understanding and reconstruction. The experiments
on the SUN RGB-D and Pix3D datasets demonstrate that our method consistently
outperforms existing methods in indoor layout estimation, 3D object detection
and mesh reconstruction.
Related papers
- Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - Towards High-Fidelity Single-view Holistic Reconstruction of Indoor
Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images.
We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction.
Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z) - TC-SfM: Robust Track-Community-Based Structure-from-Motion [24.956499348500763]
We propose to exploit high-level information in the scene, i.e., the spatial contextual information of local regions, to guide the reconstruction.
A novel structure is proposed, namely, textittrack-community, in which each community consists of a group of tracks and represents a local segment in the scene.
arXiv Detail & Related papers (2022-06-13T01:09:12Z) - Reconstructing Small 3D Objects in front of a Textured Background [0.0]
We present a technique for a complete 3D reconstruction of small objects moving in front of a textured background.
It is a particular variation of multibody structure from motion, which specializes to two objects only.
In experiments with real artifacts, we show that our approach has practical advantages when reconstructing 3D objects from all sides.
arXiv Detail & Related papers (2021-05-24T15:36:33Z) - Holistic 3D Scene Understanding from a Single Image with Implicit
Representation [112.40630836979273]
We present a new pipeline for holistic 3D scene understanding from a single image.
We propose an image-based local structured implicit network to improve the object shape estimation.
We also refine 3D object pose and scene layout via a novel implicit scene graph neural network.
arXiv Detail & Related papers (2021-03-11T02:52:46Z) - A Divide et Impera Approach for 3D Shape Reconstruction from Multiple
Views [49.03830902235915]
Estimating the 3D shape of an object from a single or multiple images has gained popularity thanks to the recent breakthroughs powered by deep learning.
This paper proposes to rely on viewpoint variant reconstructions by merging the visible information from the given views.
To validate the proposed method, we perform a comprehensive evaluation on the ShapeNet reference benchmark in terms of relative pose estimation and 3D shape reconstruction.
arXiv Detail & Related papers (2020-11-17T09:59:32Z) - CoReNet: Coherent 3D scene reconstruction from a single RGB image [43.74240268086773]
We build on advances in deep learning to reconstruct the shape of a single object given only one RBG image as input.
We propose three extensions: (1) ray-traced skip connections that propagate local 2D information to the output 3D volume in a physically correct manner; (2) a hybrid 3D volume representation that enables building translation equivariant models; and (3) a reconstruction loss tailored to capture overall object geometry.
We reconstruct all objects jointly in one pass, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
arXiv Detail & Related papers (2020-04-27T17:53:07Z) - Self-supervised Single-view 3D Reconstruction via Semantic Consistency [142.71430568330172]
We learn a self-supervised, single-view 3D reconstruction model that predicts the shape, texture and camera pose of a target object.
The proposed method does not necessitate 3D supervision, manually annotated keypoints, multi-view images of an object or a prior 3D template.
arXiv Detail & Related papers (2020-03-13T20:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.