Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze
- URL: http://arxiv.org/abs/2009.06886v1
- Date: Tue, 15 Sep 2020 06:59:12 GMT
- Title: Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze
- Authors: Jinquan Li, Ling Pei, Danping Zou, Songpengcheng Xia, Qi Wu, Tao Li,
Zhen Sun, Wenxian Yu
- Abstract summary: This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM.
It combines a visual saliency model (SalNavNet) with traditional monocular visual SLAM.
Test results prove that Attention-SLAM outperforms benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO.
- Score: 19.99938539199779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes a novel simultaneous localization and mapping (SLAM)
approach, namely Attention-SLAM, which simulates human navigation mode by
combining a visual saliency model (SalNavNet) with traditional monocular visual
SLAM. Most SLAM methods treat all the features extracted from the images as
equal importance during the optimization process. However, the salient feature
points in scenes have more significant influence during the human navigation
process. Therefore, we first propose a visual saliency model called SalVavNet
in which we introduce a correlation module and propose an adaptive Exponential
Moving Average (EMA) module. These modules mitigate the center bias to enable
the saliency maps generated by SalNavNet to pay more attention to the same
salient object. Moreover, the saliency maps simulate the human behavior for the
refinement of SLAM results. The feature points extracted from the salient
regions have greater importance in optimization process. We add semantic
saliency information to the Euroc dataset to generate an open-source saliency
SLAM dataset. Comprehensive test results prove that Attention-SLAM outperforms
benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO in
terms of efficiency, accuracy, and robustness in most test cases.
Related papers
- Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - DK-SLAM: Monocular Visual SLAM with Deep Keypoint Learning, Tracking and Loop-Closing [13.50980509878613]
Experimental evaluations on publicly available datasets demonstrate that DK-SLAM outperforms leading traditional and learning based SLAM systems.
Our system employs a Model-Agnostic Meta-Learning (MAML) strategy to optimize the training of keypoint extraction networks.
To mitigate cumulative positioning errors, DK-SLAM incorporates a novel online learning module that utilizes binary features for loop closure detection.
arXiv Detail & Related papers (2024-01-17T12:08:30Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - DVI-SLAM: A Dual Visual Inertial SLAM Network [31.067716365926845]
This paper proposes a novel deep SLAM network with dual visual factors.
We show that the proposed network dynamically learns and adjusts the confidence maps of both visual factors.
Extensive experiments validate that our proposed method significantly outperforms the state-of-the-art methods on several public datasets.
arXiv Detail & Related papers (2023-09-25T01:42:54Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - Depth Completion with Multiple Balanced Bases and Confidence for Dense
Monocular SLAM [34.78726455243436]
We propose a novel method that integrates a light-weight depth completion network into a sparse SLAM system.
Specifically, we present a specifically optimized multi-basis depth completion network, called BBC-Net.
BBC-Net can predict multiple balanced bases and a confidence map from a monocular image with sparse points generated by off-the-shelf keypoint-based SLAM systems.
arXiv Detail & Related papers (2023-09-08T06:15:27Z) - NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM [111.83168930989503]
NICER-SLAM is a dense RGB SLAM system that simultaneously optimize for camera poses and a hierarchical neural implicit map representation.
We show strong performance in dense mapping, tracking, and novel view synthesis, even competitive with recent RGB-D SLAM systems.
arXiv Detail & Related papers (2023-02-07T17:06:34Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - NICE-SLAM: Neural Implicit Scalable Encoding for SLAM [112.6093688226293]
NICE-SLAM is a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation.
Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust.
arXiv Detail & Related papers (2021-12-22T18:45:44Z) - Accurate Visual-Inertial SLAM by Feature Re-identification [4.263022790692934]
We propose an efficient drift-less SLAM method by re-identifying existing features from a spatial-temporal sensitive sub-global map.
Our method achieves 67.3% and 87.5% absolute translation error reduction with only a small additional computational cost.
arXiv Detail & Related papers (2021-02-26T12:54:33Z) - A Hybrid Learner for Simultaneous Localization and Mapping [2.1041384320978267]
Simultaneous localization and mapping (SLAM) is used to predict the dynamic motion path of a moving platform.
This work introduces a hybrid learning model that explores beyond feature fusion.
It carries out weight enhancement of the front end feature extractor of the SLAM via mutation of different deep networks' top layers.
The trajectory predictions from independently trained models are amalgamated to refine the location detail.
arXiv Detail & Related papers (2021-01-04T18:41:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.