Shallow2Deep: Indoor Scene Modeling by Single Image Understanding
- URL: http://arxiv.org/abs/2002.09790v1
- Date: Sat, 22 Feb 2020 23:27:22 GMT
- Title: Shallow2Deep: Indoor Scene Modeling by Single Image Understanding
- Authors: Yinyu Nie, Shihui Guo, Jian Chang, Xiaoguang Han, Jiahui Huang,
Shi-Min Hu, Jian Jun Zhang
- Abstract summary: We present an automatic indoor scene modeling approach using deep features from neural networks.
Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship.
- Score: 42.87957414916607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense indoor scene modeling from 2D images has been bottlenecked due to the
absence of depth information and cluttered occlusions. We present an automatic
indoor scene modeling approach using deep features from neural networks. Given
a single RGB image, our method simultaneously recovers semantic contents, 3D
geometry and object relationship by reasoning indoor environment context.
Particularly, we design a shallow-to-deep architecture on the basis of
convolutional networks for semantic scene understanding and modeling. It
involves multi-level convolutional networks to parse indoor semantics/geometry
into non-relational and relational knowledge. Non-relational knowledge
extracted from shallow-end networks (e.g. room layout, object geometry) is fed
forward into deeper levels to parse relational semantics (e.g. support
relationship). A Relation Network is proposed to infer the support relationship
between objects. All the structured semantics and geometry above are assembled
to guide a global optimization for 3D scene modeling. Qualitative and
quantitative analysis demonstrates the feasibility of our method in
understanding and modeling semantics-enriched indoor scenes by evaluating the
performance of reconstruction accuracy, computation performance and scene
complexity.
Related papers
- Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes [87.74952229507096]
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
arXiv Detail & Related papers (2021-08-11T14:29:01Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Predicting Relative Depth between Objects from Semantic Features [2.127049691404299]
The 3D depth of objects depicted in 2D images is one such feature.
The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel.
An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.
arXiv Detail & Related papers (2021-01-12T17:28:23Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.