Multi-modal Visual Place Recognition in Dynamics-Invariant Perception
Space
- URL: http://arxiv.org/abs/2105.07800v1
- Date: Mon, 17 May 2021 13:14:52 GMT
- Title: Multi-modal Visual Place Recognition in Dynamics-Invariant Perception
Space
- Authors: Lin Wu, Teng Wang, Changyin Sun
- Abstract summary: This letter explores the use of multi-modal fusion of semantic and visual modalities to improve place recognition in dynamic environments.
We achieve this by first designing a novel deep learning architecture to generate the static semantic segmentation.
We then innovatively leverage the spatial-pyramid-matching model to encode the static semantic segmentation into feature vectors.
In parallel, the static image is encoded using the popular Bag-of-words model.
- Score: 23.43468556831308
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual place recognition is one of the essential and challenging problems in
the fields of robotics. In this letter, we for the first time explore the use
of multi-modal fusion of semantic and visual modalities in dynamics-invariant
space to improve place recognition in dynamic environments. We achieve this by
first designing a novel deep learning architecture to generate the static
semantic segmentation and recover the static image directly from the
corresponding dynamic image. We then innovatively leverage the
spatial-pyramid-matching model to encode the static semantic segmentation into
feature vectors. In parallel, the static image is encoded using the popular
Bag-of-words model. On the basis of the above multi-modal features, we finally
measure the similarity between the query image and target landmark by the joint
similarity of their semantic and visual codes. Extensive experiments
demonstrate the effectiveness and robustness of the proposed approach for place
recognition in dynamic environments.
Related papers
- Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective [10.938290904843939]
We propose "Bi-level Optimization of Learning Dynamic with Decoupling and Intervention" (BOLD-DI) to capture both static and dynamic semantics in a decoupled manner.
Our method can be seamlessly integrated into the existing v-CL methods and experimental results highlight the significant improvements.
arXiv Detail & Related papers (2024-07-19T06:53:54Z) - Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM [17.661231232206028]
Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention.
We propose a novel SLAM framework for dynamic environments.
arXiv Detail & Related papers (2024-07-18T09:35:48Z) - Dynamic in Static: Hybrid Visual Correspondence for Self-Supervised Video Object Segmentation [126.12940972028012]
We present HVC, a framework for self-supervised video object segmentation.
HVC extracts pseudo-dynamic signals from static images, enabling an efficient and scalable VOS model.
We propose a hybrid visual correspondence loss to learn joint static and dynamic consistency representations.
arXiv Detail & Related papers (2024-04-21T02:21:30Z) - Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization [61.64304227831361]
Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains.
We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
arXiv Detail & Related papers (2024-02-28T16:16:51Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Learning to Model Multimodal Semantic Alignment for Story Visualization [58.16484259508973]
Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story.
Current works face the problem of semantic misalignment because of their fixed architecture and diversity of input modalities.
We explore the semantic alignment between text and image representations by learning to match their semantic levels in the GAN-based generative model.
arXiv Detail & Related papers (2022-11-14T11:41:44Z) - Dynamic Texture Recognition via Nuclear Distances on Kernelized
Scattering Histogram Spaces [95.21606283608683]
This work proposes to describe dynamic textures as kernelized spaces of frame-wise feature vectors computed using the Scattering transform.
By combining these spaces with a basis-invariant metric, we get a framework that produces competitive results for nearest neighbor classification and state-of-the-art results for nearest class center classification.
arXiv Detail & Related papers (2021-02-01T13:54:24Z) - Empty Cities: a Dynamic-Object-Invariant Space for Visual SLAM [6.693607456009373]
We present a data-driven approach to obtain the static image of a scene, eliminating dynamic objects that might have been present at the time of traversing the scene with a camera.
We introduce an end-to-end deep learning framework to turn images of an urban environment into realistic static frames suitable for localization and mapping.
arXiv Detail & Related papers (2020-10-15T10:31:12Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.