Related papers: SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words

SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words

URL: http://arxiv.org/abs/2110.11491v1
Date: Thu, 21 Oct 2021 21:34:57 GMT
Title: SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted Objects and Visual Bag-of-Words
Authors: Jonathan J.Y. Kim, Martin Urschler, Patricia J. Riddle, J\"org S. Wicker
Abstract summary: Loop closure detection is an essential tool of SLAM to minimize drift in its localization. Many state-of-the-art loop closure detection algorithms use visual Bag-of-Words (vBoW) We propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction.
Score: 2.924868086534434
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Loop closure detection is an essential tool of Simultaneous Localization and Mapping (SLAM) to minimize drift in its localization. Many state-of-the-art loop closure detection (LCD) algorithms use visual Bag-of-Words (vBoW), which is robust against partial occlusions in a scene but cannot perceive the semantics or spatial relationships between feature points. CNN object extraction can address those issues, by providing semantic labels and spatial relationships between objects in a scene. Previous work has mainly focused on replacing vBoW with CNN-derived features. In this paper, we propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction. When used in tandem, the added elements of object semantics and spatial-awareness create a more robust and symbiotic loop closure detection system. The proposed SymbioLCD uses scale-invariant spatial and semantic matching, Hausdorff distance with temporal constraints, and a Random Forest that utilizes combined information from both CNN-extracted objects and vBoW features for predicting accurate loop closure candidates. Evaluation of the proposed method shows it outperforms other Machine Learning (ML) algorithms - such as SVM, Decision Tree and Neural Network, and demonstrates that there is a strong symbiosis between CNN-extracted object information and vBoW features which assists accurate LCD candidate prediction. Furthermore, it is able to perceive loop closure candidates earlier than state-of-the-art SLAM algorithms, utilizing added spatial and semantic information from CNN-extracted objects.

Related papers

Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection [57.883265488038134]
We propose a hierarchical graph interaction network termed HGINet for camouflaged object detection. The network is capable of discovering imperceptible objects via effective graph interaction among the hierarchical tokenized features. Our experiments demonstrate the superior performance of HGINet compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-27T12:53:25Z)
LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion [79.22197702626542]
This paper introduces a framework that explores amodal segmentation for robotic grasping in cluttered scenes. We propose a Linear-fusion Attention-guided Convolutional Network (LAC-Net) The results on different datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-08-06T14:50:48Z)
Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence. We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps. We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z)
Spatio-Temporal-based Context Fusion for Video Anomaly Detection [1.7710335706046505]
Video anomaly aims to discover abnormal events in videos, and the principal objects are target objects such as people and vehicles. Most existing methods only focus on the temporal context, ignoring the role of the spatial context in anomaly detection. This paper proposes a video anomaly detection algorithm based on target-temporal context fusion.
arXiv Detail & Related papers (2022-10-18T04:07:10Z)
Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes [2.236663830879273]
Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
arXiv Detail & Related papers (2022-09-24T00:42:33Z)
Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability. Our approach is based on a discriminative learning loss formulation that takes into account both object and background information. Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z)
Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation. Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z)
Extending Maps with Semantic and Contextual Object Information for Robot Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images. We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z)
Depth Based Semantic Scene Completion with Position Importance Aware Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion. It extracts both 2D and 3D features from multi-stages using fine-grained depth information. It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.