SemanticSLAM: Learning based Semantic Map Construction and Robust Camera
Localization
- URL: http://arxiv.org/abs/2401.13076v1
- Date: Tue, 23 Jan 2024 20:02:02 GMT
- Title: SemanticSLAM: Learning based Semantic Map Construction and Robust Camera
Localization
- Authors: Mingyang Li, Yue Ma, and Qinru Qiu
- Abstract summary: We introduce SemanticSLAM, an end-to-end visual-inertial odometry system.
SemanticSLAM uses semantic features extracted from an RGB-D sensor.
It operates effectively in indoor settings, even with infrequent camera input.
- Score: 8.901799744401314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current techniques in Visual Simultaneous Localization and Mapping (VSLAM)
estimate camera displacement by comparing image features of consecutive scenes.
These algorithms depend on scene continuity, hence requires frequent camera
inputs. However, processing images frequently can lead to significant memory
usage and computation overhead. In this study, we introduce SemanticSLAM, an
end-to-end visual-inertial odometry system that utilizes semantic features
extracted from an RGB-D sensor. This approach enables the creation of a
semantic map of the environment and ensures reliable camera localization.
SemanticSLAM is scene-agnostic, which means it doesn't require retraining for
different environments. It operates effectively in indoor settings, even with
infrequent camera input, without prior knowledge. The strength of SemanticSLAM
lies in its ability to gradually refine the semantic map and improve pose
estimation. This is achieved by a convolutional long-short-term-memory
(ConvLSTM) network, trained to correct errors during map construction. Compared
to existing VSLAM algorithms, SemanticSLAM improves pose estimation by 17%. The
resulting semantic map provides interpretable information about the environment
and can be easily applied to various downstream tasks, such as path planning,
obstacle avoidance, and robot navigation. The code will be publicly available
at https://github.com/Leomingyangli/SemanticSLAM
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Loopy-SLAM: Dense Neural SLAM with Loop Closures [53.11936461015725]
We introduce Loopy-SLAM that globally optimize poses and the dense 3D model.
We use frame-to-model tracking using a data-driven point-based submap generation method and trigger loop closures online by performing global place recognition.
Evaluation on the synthetic Replica and real-world TUM-RGBD and ScanNet datasets demonstrate competitive or superior performance in tracking, mapping, and rendering accuracy when compared to existing dense neural RGBD SLAM methods.
arXiv Detail & Related papers (2024-02-14T18:18:32Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Visual Localization via Few-Shot Scene Region Classification [84.34083435501094]
Visual (re)localization addresses the problem of estimating the 6-DoF camera pose of a query image captured in a known scene.
Recent advances in structure-based localization solve this problem by memorizing the mapping from image pixels to scene coordinates.
We propose a scene region classification approach to achieve fast and effective scene memorization with few-shot images.
arXiv Detail & Related papers (2022-08-14T22:39:02Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - CodeMapping: Real-Time Dense Mapping for Sparse SLAM using Compact Scene
Representations [20.79223452551813]
State-of-the-art sparse visual SLAM systems provide accurate estimates of the camera trajectory and locations of landmarks.
While these sparse maps are useful for localization, they cannot be used for other tasks such as obstacle avoidance or scene understanding.
We propose a dense mapping framework to complement sparse visual SLAM systems which takes as input the camera poses and sparse points produced by the SLAM system and predicts a dense depth image for every.
arXiv Detail & Related papers (2021-07-19T16:13:18Z) - LatentSLAM: unsupervised multi-sensor representation learning for
localization and mapping [7.857987850592964]
We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors.
Our method is sensor agnostic and can be applied to any sensor modality.
We show how combining multiple sensors can increase the robustness, by reducing the number of false matches.
arXiv Detail & Related papers (2021-05-07T13:44:32Z) - Visual Camera Re-Localization Using Graph Neural Networks and Relative
Pose Supervision [31.947525258453584]
Visual re-localization means using a single image as input to estimate the camera's location and orientation relative to a pre-recorded environment.
Our proposed method makes few special assumptions, and is fairly lightweight in training and testing.
We validate the effectiveness of our approach on both standard indoor (7-Scenes) and outdoor (Cambridge Landmarks) camera re-localization benchmarks.
arXiv Detail & Related papers (2021-04-06T14:29:03Z) - Scale Normalized Image Pyramids with AutoFocus for Object Detection [75.71320993452372]
A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
arXiv Detail & Related papers (2021-02-10T18:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.