ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction
- URL: http://arxiv.org/abs/2311.18068v2
- Date: Sun, 3 Dec 2023 08:33:52 GMT
- Title: ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction
- Authors: Silvan Weder, Francis Engelmann, Johannes L. Sch\"onberger, Akihito
Seki, Marc Pollefeys, Martin R. Oswald
- Abstract summary: We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
- Score: 62.599588577671796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an online 3D semantic segmentation method that incrementally
reconstructs a 3D semantic map from a stream of RGB-D frames. Unlike offline
methods, ours is directly applicable to scenarios with real-time constraints,
such as robotics or mixed reality. To overcome the inherent challenges of
online methods, we make two main contributions. First, to effectively extract
information from the input RGB-D video stream, we jointly estimate geometry and
semantic labels per frame in 3D. A key focus of our approach is to reason about
semantic entities both in the 2D input and the local 3D domain to leverage
differences in spatial context and network architectures. Our method predicts
2D features using an off-the-shelf segmentation network. The extracted 2D
features are refined by a lightweight 3D network to enable reasoning about the
local 3D structure. Second, to efficiently deal with an infinite stream of
input RGB-D frames, a subsequent network serves as a temporal expert predicting
the incremental scene updates by leveraging 2D, 3D, and past information in a
learned manner. These updates are then integrated into a global scene
representation. Using these main contributions, our method can enable scenarios
with real-time constraints and can scale to arbitrary scene sizes by processing
and updating the scene only in a local region defined by the new measurement.
Our experiments demonstrate improved results compared to existing online
methods that purely operate in local regions and show that complementary
sources of information can boost the performance. We provide a thorough
ablation study on the benefits of different architectural as well as
algorithmic design decisions. Our method yields competitive results on the
popular ScanNet benchmark and SceneNN dataset.
Related papers
- SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception
from Monocular Video [2.2299983745857896]
We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels.
We propose an end-to-end cross-dimensional refinement neural network (CDRNet) to extract both 3D mesh and 3D semantic labeling in real time.
arXiv Detail & Related papers (2023-03-16T11:53:29Z) - SSR-2D: Semantic 3D Scene Reconstruction from 2D Images [54.46126685716471]
In this work, we explore a central 3D scene modeling task, namely, semantic scene reconstruction without using any 3D annotations.
The key idea of our approach is to design a trainable model that employs both incomplete 3D reconstructions and their corresponding source RGB-D images.
Our method achieves the state-of-the-art performance of semantic scene completion on two large-scale benchmark datasets MatterPort3D and ScanNet.
arXiv Detail & Related papers (2023-02-07T17:47:52Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.