Neural 3D Scene Reconstruction with the Manhattan-world Assumption
- URL: http://arxiv.org/abs/2205.02836v1
- Date: Thu, 5 May 2022 17:59:55 GMT
- Title: Neural 3D Scene Reconstruction with the Manhattan-world Assumption
- Authors: Haoyu Guo, Sida Peng, Haotong Lin, Qianqian Wang, Guofeng Zhang, Hujun
Bao, Xiaowei Zhou
- Abstract summary: This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
- Score: 58.90559966227361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenge of reconstructing 3D indoor scenes from
multi-view images. Many previous works have shown impressive reconstruction
results on textured objects, but they still have difficulty in handling
low-textured planar regions, which are common in indoor scenes. An approach to
solving this issue is to incorporate planer constraints into the depth map
estimation in multi-view stereo-based methods, but the per-view plane
estimation and depth optimization lack both efficiency and multi-view
consistency. In this work, we show that the planar constraints can be
conveniently integrated into the recent implicit neural representation-based
reconstruction methods. Specifically, we use an MLP network to represent the
signed distance function as the scene geometry. Based on the Manhattan-world
assumption, planar constraints are employed to regularize the geometry in floor
and wall regions predicted by a 2D semantic segmentation network. To resolve
the inaccurate segmentation, we encode the semantics of 3D points with another
MLP and design a novel loss that jointly optimizes the scene geometry and
semantics in 3D space. Experiments on ScanNet and 7-Scenes datasets show that
the proposed method outperforms previous methods by a large margin on 3D
reconstruction quality. The code is available at
https://zju3dv.github.io/manhattan_sdf.
Related papers
- Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning [119.99066522299309]
KYN is a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density.
We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation.
We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work.
arXiv Detail & Related papers (2024-04-04T17:59:59Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Neural 3D Scene Reconstruction from Multiple 2D Images without 3D
Supervision [41.20504333318276]
We propose a novel neural reconstruction method that reconstructs scenes using sparse depth under the plane constraints without 3D supervision.
We introduce a signed distance function field, a color field, and a probability field to represent a scene.
We optimize these fields to reconstruct the scene by using differentiable ray marching with accessible 2D images as supervision.
arXiv Detail & Related papers (2023-06-30T13:30:48Z) - SimpleRecon: 3D Reconstruction Without 3D Convolutions [21.952478592241]
We show how focusing on high quality multi-view depth prediction leads to highly accurate 3D reconstructions using simple off-the-shelf depth fusion.
Our method achieves a significant lead over the current state-of-the-art for depth estimation and close or better for 3D reconstruction on ScanNet and 7-Scenes.
arXiv Detail & Related papers (2022-08-31T09:46:34Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images [44.223070672713455]
In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders.
Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects.
We propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids.
arXiv Detail & Related papers (2021-05-05T13:36:00Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - Atlas: End-to-End 3D Scene Reconstruction from Posed Images [13.154808583020229]
We present an end-to-end 3D reconstruction method for a scene by directly regressing a truncated signed distance function (TSDF) from a set of posed RGB images.
A 2D CNN extracts features from each image independently which are then back-projected and accumulated into a voxel volume.
A 3D CNN refines the accumulated features and predicts the TSDF values.
arXiv Detail & Related papers (2020-03-23T17:59:15Z) - Implicit Functions in Feature Space for 3D Shape Reconstruction and
Completion [53.885984328273686]
Implicit Feature Networks (IF-Nets) deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data.
IF-Nets clearly outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions.
arXiv Detail & Related papers (2020-03-03T11:14:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.