A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes
- URL: http://arxiv.org/abs/2108.05246v1
- Date: Wed, 11 Aug 2021 14:29:01 GMT
- Title: A Real-Time Online Learning Framework for Joint 3D Reconstruction and
Semantic Segmentation of Indoor Scenes
- Authors: Davide Menini, Suryansh Kumar, Martin R. Oswald, Erik Sandstrom,
Cristian Sminchisescu, Luc Van Gool
- Abstract summary: This paper presents a real-time online vision framework to jointly recover an indoor scene's 3D structure and semantic label.
Given noisy depth maps, a camera trajectory, and 2D semantic labels at train time, the proposed neural network learns to fuse the depth over frames with suitable semantic labels in the scene space.
- Score: 87.74952229507096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a real-time online vision framework to jointly recover an
indoor scene's 3D structure and semantic label. Given noisy depth maps, a
camera trajectory, and 2D semantic labels at train time, the proposed neural
network learns to fuse the depth over frames with suitable semantic labels in
the scene space. Our approach exploits the joint volumetric representation of
the depth and semantics in the scene feature space to solve this task. For a
compelling online fusion of the semantic labels and geometry in real-time, we
introduce an efficient vortex pooling block while dropping the routing network
in online depth fusion to preserve high-frequency surface details. We show that
the context information provided by the semantics of the scene helps the depth
fusion network learn noise-resistant features. Not only that, it helps overcome
the shortcomings of the current online depth fusion method in dealing with thin
object structures, thickening artifacts, and false surfaces. Experimental
evaluation on the Replica dataset shows that our approach can perform depth
fusion at 37, 10 frames per second with an average reconstruction F-score of
88%, and 91%, respectively, depending on the depth map resolution. Moreover,
our model shows an average IoU score of 0.515 on the ScanNet 3D semantic
benchmark leaderboard.
Related papers
- RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth
Completion [31.70022495622075]
We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values.
In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments.
In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity.
In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
arXiv Detail & Related papers (2023-09-01T09:11:20Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception
from Monocular Video [2.2299983745857896]
We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels.
We propose an end-to-end cross-dimensional refinement neural network (CDRNet) to extract both 3D mesh and 3D semantic labeling in real time.
arXiv Detail & Related papers (2023-03-16T11:53:29Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - NeuralFusion: Online Depth Fusion in Latent Space [77.59420353185355]
We present a novel online depth map fusion approach that learns depth map aggregation in a latent feature space.
Our approach is real-time capable, handles high noise levels, and is particularly able to deal with gross outliers common for photometric stereo-based depth maps.
arXiv Detail & Related papers (2020-11-30T13:50:59Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z) - Shallow2Deep: Indoor Scene Modeling by Single Image Understanding [42.87957414916607]
We present an automatic indoor scene modeling approach using deep features from neural networks.
Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship.
arXiv Detail & Related papers (2020-02-22T23:27:22Z) - RoutedFusion: Learning Real-time Depth Map Fusion [73.0378509030908]
We present a novel real-time capable machine learning-based method for depth map fusion.
We propose a neural network that predicts non-linear updates to better account for typical fusion errors.
Our network is composed of a 2D depth routing network and a 3D depth fusion network which efficiently handle sensor-specific noise and outliers.
arXiv Detail & Related papers (2020-01-13T16:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.