SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted
Objects and Visual Bag-of-Words
- URL: http://arxiv.org/abs/2110.11491v1
- Date: Thu, 21 Oct 2021 21:34:57 GMT
- Title: SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted
Objects and Visual Bag-of-Words
- Authors: Jonathan J.Y. Kim, Martin Urschler, Patricia J. Riddle, J\"org S.
Wicker
- Abstract summary: Loop closure detection is an essential tool of SLAM to minimize drift in its localization.
Many state-of-the-art loop closure detection algorithms use visual Bag-of-Words (vBoW)
We propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction.
- Score: 2.924868086534434
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Loop closure detection is an essential tool of Simultaneous Localization and
Mapping (SLAM) to minimize drift in its localization. Many state-of-the-art
loop closure detection (LCD) algorithms use visual Bag-of-Words (vBoW), which
is robust against partial occlusions in a scene but cannot perceive the
semantics or spatial relationships between feature points. CNN object
extraction can address those issues, by providing semantic labels and spatial
relationships between objects in a scene. Previous work has mainly focused on
replacing vBoW with CNN-derived features. In this paper, we propose SymbioLCD,
a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW
features for LCD candidate prediction. When used in tandem, the added elements
of object semantics and spatial-awareness create a more robust and symbiotic
loop closure detection system. The proposed SymbioLCD uses scale-invariant
spatial and semantic matching, Hausdorff distance with temporal constraints,
and a Random Forest that utilizes combined information from both CNN-extracted
objects and vBoW features for predicting accurate loop closure candidates.
Evaluation of the proposed method shows it outperforms other Machine Learning
(ML) algorithms - such as SVM, Decision Tree and Neural Network, and
demonstrates that there is a strong symbiosis between CNN-extracted object
information and vBoW features which assists accurate LCD candidate prediction.
Furthermore, it is able to perceive loop closure candidates earlier than
state-of-the-art SLAM algorithms, utilizing added spatial and semantic
information from CNN-extracted objects.
Related papers
- Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection.
The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features.
Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z) - LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes.
We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net)
The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Spatio-Temporal-based Context Fusion for Video Anomaly Detection [1.7710335706046505]
Video anomaly aims to discover abnormal events in videos, and the principal objects are target objects such as people and vehicles.
Most existing methods only focus on the temporal context, ignoring the role of the spatial context in anomaly detection.
This paper proposes a video anomaly detection algorithm based on target-temporal context fusion.
arXiv Detail & Related papers (2022-10-18T04:07:10Z) - Closing the Loop: Graph Networks to Unify Semantic Objects and Visual
Features for Multi-object Scenes [2.236663830879273]
Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places.
Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems.
This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
arXiv Detail & Related papers (2022-09-24T00:42:33Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation.
Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z) - Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images.
We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.