STA-VPR: Spatio-temporal Alignment for Visual Place Recognition
- URL: http://arxiv.org/abs/2103.13580v1
- Date: Thu, 25 Mar 2021 03:27:42 GMT
- Title: STA-VPR: Spatio-temporal Alignment for Visual Place Recognition
- Authors: Feng Lu, Baifan Chen, Xiang-Dong Zhou and Dezhen Song
- Abstract summary: We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images.
A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment.
The results show that the proposed method significantly improves the CNN-based methods.
- Score: 17.212503755962757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the methods based on Convolutional Neural Networks (CNNs) have
gained popularity in the field of visual place recognition (VPR). In
particular, the features from the middle layers of CNNs are more robust to
drastic appearance changes than handcrafted features and high-layer features.
Unfortunately, the holistic mid-layer features lack robustness to large
viewpoint changes. Here we split the holistic mid-layer features into local
features, and propose an adaptive dynamic time warping (DTW) algorithm to align
local features from the spatial domain while measuring the distance between two
images. This realizes viewpoint-invariant and condition-invariant place
recognition. Meanwhile, a local matching DTW (LM-DTW) algorithm is applied to
perform image sequence matching based on temporal alignment, which achieves
further improvements and ensures linear time complexity. We perform extensive
experiments on five representative VPR datasets. The results show that the
proposed method significantly improves the CNN-based methods. Moreover, our
method outperforms several state-of-the-art methods while maintaining good
run-time performance. This work provides a novel way to boost the performance
of CNN methods without any re-training for VPR. The code is available at
https://github.com/Lu-Feng/STA-VPR.
Related papers
- VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition [23.173085268845384]
This paper introduces VLAD-BuFF, a self-similarity based feature discounting mechanism to learn Burst-aware features within end-to-end VPR training.
We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art.
Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall.
arXiv Detail & Related papers (2024-09-28T09:44:08Z) - Vector Field Attention for Deformable Image Registration [9.852055065890479]
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images.
Most existing deep learning-based methods require neural networks to encode location information in their feature maps.
We present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences.
arXiv Detail & Related papers (2024-07-14T14:06:58Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Long-Term Invariant Local Features via Implicit Cross-Domain
Correspondences [79.21515035128832]
We conduct a thorough analysis of the performance of current state-of-the-art feature extraction networks under various domain changes.
We propose a novel data-centric method, Implicit Cross-Domain Correspondences (iCDC)
iCDC represents the same environment with multiple Neural Radiance Fields, each fitting the scene under individual visual domains.
arXiv Detail & Related papers (2023-11-06T18:53:01Z) - AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots.
We present a unified network capable of extracting global features for retrieving candidates via an aggregation module.
We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z) - Self-Supervised Visual Place Recognition by Mining Temporal and Feature
Neighborhoods [17.852415436033436]
We propose a novel framework named textitTF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification.
arXiv Detail & Related papers (2022-08-19T12:59:46Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.