SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted
Objects and Visual Bag-of-Words
- URL: http://arxiv.org/abs/2110.11491v1
- Date: Thu, 21 Oct 2021 21:34:57 GMT
- Title: SymbioLCD: Ensemble-Based Loop Closure Detection using CNN-Extracted
Objects and Visual Bag-of-Words
- Authors: Jonathan J.Y. Kim, Martin Urschler, Patricia J. Riddle, J\"org S.
Wicker
- Abstract summary: Loop closure detection is an essential tool of SLAM to minimize drift in its localization.
Many state-of-the-art loop closure detection algorithms use visual Bag-of-Words (vBoW)
We propose SymbioLCD, a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW features for LCD candidate prediction.
- Score: 2.924868086534434
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Loop closure detection is an essential tool of Simultaneous Localization and
Mapping (SLAM) to minimize drift in its localization. Many state-of-the-art
loop closure detection (LCD) algorithms use visual Bag-of-Words (vBoW), which
is robust against partial occlusions in a scene but cannot perceive the
semantics or spatial relationships between feature points. CNN object
extraction can address those issues, by providing semantic labels and spatial
relationships between objects in a scene. Previous work has mainly focused on
replacing vBoW with CNN-derived features. In this paper, we propose SymbioLCD,
a novel ensemble-based LCD that utilizes both CNN-extracted objects and vBoW
features for LCD candidate prediction. When used in tandem, the added elements
of object semantics and spatial-awareness create a more robust and symbiotic
loop closure detection system. The proposed SymbioLCD uses scale-invariant
spatial and semantic matching, Hausdorff distance with temporal constraints,
and a Random Forest that utilizes combined information from both CNN-extracted
objects and vBoW features for predicting accurate loop closure candidates.
Evaluation of the proposed method shows it outperforms other Machine Learning
(ML) algorithms - such as SVM, Decision Tree and Neural Network, and
demonstrates that there is a strong symbiosis between CNN-extracted object
information and vBoW features which assists accurate LCD candidate prediction.
Furthermore, it is able to perceive loop closure candidates earlier than
state-of-the-art SLAM algorithms, utilizing added spatial and semantic
information from CNN-extracted objects.
Related papers
- StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory [21.300636683882338]
We propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences.
Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior to moving objects.
We also present multi-view encoder with projection and asymmetric convolution to extract motion feature of objects in different representations.
arXiv Detail & Related papers (2024-07-25T09:51:09Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Spatio-Temporal-based Context Fusion for Video Anomaly Detection [1.7710335706046505]
Video anomaly aims to discover abnormal events in videos, and the principal objects are target objects such as people and vehicles.
Most existing methods only focus on the temporal context, ignoring the role of the spatial context in anomaly detection.
This paper proposes a video anomaly detection algorithm based on target-temporal context fusion.
arXiv Detail & Related papers (2022-10-18T04:07:10Z) - Closing the Loop: Graph Networks to Unify Semantic Objects and Visual
Features for Multi-object Scenes [2.236663830879273]
Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places.
Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems.
This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically.
arXiv Detail & Related papers (2022-09-24T00:42:33Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Location-Sensitive Visual Recognition with Cross-IOU Loss [177.86369890708457]
This paper proposes a unified solution named location-sensitive network (LSNet) for object detection, instance segmentation, and pose estimation.
Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object.
arXiv Detail & Related papers (2021-04-11T02:17:14Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Extending Maps with Semantic and Contextual Object Information for Robot
Navigation: a Learning-Based Framework using Visual and Depth Cues [12.984393386954219]
This paper addresses the problem of building augmented metric representations of scenes with semantic information from RGB-D images.
We propose a complete framework to create an enhanced map representation of the environment with object-level information.
arXiv Detail & Related papers (2020-03-13T15:05:23Z) - Expressing Objects just like Words: Recurrent Visual Embedding for
Image-Text Matching [102.62343739435289]
Existing image-text matching approaches infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image.
We propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN)
Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset.
arXiv Detail & Related papers (2020-02-20T00:51:01Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.