SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation
- URL: http://arxiv.org/abs/2306.16585v2
- Date: Fri, 13 Oct 2023 14:56:58 GMT
- Title: SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation
- Authors: Jingwen Wang, Juan Tarrio, Lourdes Agapito, Pablo F. Alcantarilla,
Alexander Vakhitov
- Abstract summary: We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
- Score: 53.83313235792596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The availability of real-time semantics greatly improves the core geometric
functionality of SLAM systems, enabling numerous robotic and AR/VR
applications. We present a new methodology for real-time semantic mapping from
RGB-D sequences that combines a 2D neural network and a 3D network based on a
SLAM system with 3D occupancy mapping. When segmenting a new frame we perform
latent feature re-projection from previous frames based on differentiable
rendering. Fusing re-projected feature maps from previous frames with
current-frame features greatly improves image segmentation quality, compared to
a baseline that processes images independently. For 3D map processing, we
propose a novel geometric quasi-planar over-segmentation method that groups 3D
map elements likely to belong to the same semantic classes, relying on surface
normals. We also describe a novel neural network design for lightweight
semantic map post-processing. Our system achieves state-of-the-art semantic
mapping quality within 2D-3D networks-based systems and matches the performance
of 3D convolutional networks on three real indoor datasets, while working in
real-time. Moreover, it shows better cross-sensor generalization abilities
compared to 3D CNNs, enabling training and inference with different depth
sensors. Code and data will be released on project page:
http://jingwenwang95.github.io/SeMLaPS
Related papers
- MeshConv3D: Efficient convolution and pooling operators for triangular 3D meshes [0.0]
MeshConv3D is a 3D mesh-dedicated methodology integrating specialized convolution and face collapse-based pooling operators.
The experimental results obtained on three distinct benchmark datasets show that the proposed approach makes it possible to achieve equivalent or superior classification results.
arXiv Detail & Related papers (2025-01-07T14:41:26Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Implicit Ray-Transformers for Multi-view Remote Sensing Image
Segmentation [26.726658200149544]
We propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR) for RS scene semantic segmentation with sparse labels.
The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene.
In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations.
arXiv Detail & Related papers (2023-03-15T07:05:07Z) - Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D
Shapes [77.6741486264257]
We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs.
We show that our representation is 2-3 orders of magnitude more efficient in terms of rendering speed compared to previous works.
arXiv Detail & Related papers (2021-01-26T18:50:22Z) - Learning Local Neighboring Structure for Robust 3D Shape Representation [143.15904669246697]
Representation learning for 3D meshes is important in many computer vision and graphics applications.
We propose a local structure-aware anisotropic convolutional operation (LSA-Conv)
Our model produces significant improvement in 3D shape reconstruction compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-21T13:40:03Z) - Pointwise Attention-Based Atrous Convolutional Neural Networks [15.499267533387039]
A pointwise attention-based atrous convolutional neural network architecture is proposed to efficiently deal with a large number of points.
The proposed model has been evaluated on the two most important 3D point cloud datasets for the 3D semantic segmentation task.
It achieves a reasonable performance compared to state-of-the-art models in terms of accuracy, with a much smaller number of parameters.
arXiv Detail & Related papers (2019-12-27T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.