Related papers: STA-VPR: Spatio-temporal Alignment for Visual Place Recognition

STA-VPR: Spatio-temporal Alignment for Visual Place Recognition

URL: http://arxiv.org/abs/2103.13580v1
Date: Thu, 25 Mar 2021 03:27:42 GMT
Title: STA-VPR: Spatio-temporal Alignment for Visual Place Recognition
Authors: Feng Lu, Baifan Chen, Xiang-Dong Zhou and Dezhen Song
Abstract summary: We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images. A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment. The results show that the proposed method significantly improves the CNN-based methods.
Score: 17.212503755962757
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the methods based on Convolutional Neural Networks (CNNs) have gained popularity in the field of visual place recognition (VPR). In particular, the features from the middle layers of CNNs are more robust to drastic appearance changes than handcrafted features and high-layer features. Unfortunately, the holistic mid-layer features lack robustness to large viewpoint changes. Here we split the holistic mid-layer features into local features, and propose an adaptive dynamic time warping (DTW) algorithm to align local features from the spatial domain while measuring the distance between two images. This realizes viewpoint-invariant and condition-invariant place recognition. Meanwhile, a local matching DTW (LM-DTW) algorithm is applied to perform image sequence matching based on temporal alignment, which achieves further improvements and ensures linear time complexity. We perform extensive experiments on five representative VPR datasets. The results show that the proposed method significantly improves the CNN-based methods. Moreover, our method outperforms several state-of-the-art methods while maintaining good run-time performance. This work provides a novel way to boost the performance of CNN methods without any re-training for VPR. The code is available at https://github.com/Lu-Feng/STA-VPR.

Related papers

EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition [9.75969669445091]
Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision.<n>We propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints.
arXiv Detail & Related papers (2025-06-16T06:40:12Z)
VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition [23.173085268845384]
This paper introduces VLAD-BuFF, a self-similarity based feature discounting mechanism to learn Burst-aware features within end-to-end VPR training. We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art. Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall.
arXiv Detail & Related papers (2024-09-28T09:44:08Z)
Vector Field Attention for Deformable Image Registration [9.852055065890479]
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Most existing deep learning-based methods require neural networks to encode location information in their feature maps. We present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences.
arXiv Detail & Related papers (2024-07-14T14:06:58Z)
Efficient Visual State Space Model for Image Deblurring [99.54894198086852]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.<n>We propose a simple yet effective visual state space model (EVSSM) for image deblurring.<n>The proposed EVSSM performs favorably against state-of-the-art methods on benchmark datasets and real-world images.
arXiv Detail & Related papers (2024-05-23T09:13:36Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
Long-Term Invariant Local Features via Implicit Cross-Domain Correspondences [79.21515035128832]
We conduct a thorough analysis of the performance of current state-of-the-art feature extraction networks under various domain changes. We propose a novel data-centric method, Implicit Cross-Domain Correspondences (iCDC) iCDC represents the same environment with multiple Neural Radiance Fields, each fitting the scene under individual visual domains.
arXiv Detail & Related papers (2023-11-06T18:53:01Z)
AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module. We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z)
MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data. We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively. To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.