Shallow2Deep: Indoor Scene Modeling by Single Image Understanding
- URL: http://arxiv.org/abs/2002.09790v1
- Date: Sat, 22 Feb 2020 23:27:22 GMT
- Title: Shallow2Deep: Indoor Scene Modeling by Single Image Understanding
- Authors: Yinyu Nie, Shihui Guo, Jian Chang, Xiaoguang Han, Jiahui Huang,
Shi-Min Hu, Jian Jun Zhang
- Abstract summary: We present an automatic indoor scene modeling approach using deep features from neural networks.
Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship.
- Score: 42.87957414916607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dense indoor scene modeling from 2D images has been bottlenecked due to the
absence of depth information and cluttered occlusions. We present an automatic
indoor scene modeling approach using deep features from neural networks. Given
a single RGB image, our method simultaneously recovers semantic contents, 3D
geometry and object relationship by reasoning indoor environment context.
Particularly, we design a shallow-to-deep architecture on the basis of
convolutional networks for semantic scene understanding and modeling. It
involves multi-level convolutional networks to parse indoor semantics/geometry
into non-relational and relational knowledge. Non-relational knowledge
extracted from shallow-end networks (e.g. room layout, object geometry) is fed
forward into deeper levels to parse relational semantics (e.g. support
relationship). A Relation Network is proposed to infer the support relationship
between objects. All the structured semantics and geometry above are assembled
to guide a global optimization for 3D scene modeling. Qualitative and
quantitative analysis demonstrates the feasibility of our method in
understanding and modeling semantics-enriched indoor scenes by evaluating the
performance of reconstruction accuracy, computation performance and scene
complexity.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes [87.74952229507096]
This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
arXiv Detail & Related papers (2021-08-11T14:29:01Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.