AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition
- URL: http://arxiv.org/abs/2310.05184v1
- Date: Sun, 8 Oct 2023 14:46:11 GMT
- Title: AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition
- Authors: Feng Lu, Lijun Zhang, Shuting Dong, Baifan Chen and Chun Yuan
- Abstract summary: Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots.
We present a unified network capable of extracting global features for retrieving candidates via an aggregation module.
We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
- Score: 48.043749855085025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual place recognition (VPR) is one of the research hotspots in robotics,
which uses visual information to locate robots. Recently, the hierarchical
two-stage VPR methods have become popular in this field due to the trade-off
between accuracy and efficiency. These methods retrieve the top-k candidate
images using the global features in the first stage, then re-rank the
candidates by matching the local features in the second stage. However, they
usually require additional algorithms (e.g. RANSAC) for geometric consistency
verification in re-ranking, which is time-consuming. Here we propose a
Dynamically Aligning Local Features (DALF) algorithm to align the local
features under spatial constraints. It is significantly more efficient than the
methods that need geometric consistency verification. We present a unified
network capable of extracting global features for retrieving candidates via an
aggregation module and aligning local features for re-ranking via the DALF
alignment module. We call this network AANet. Meanwhile, many works use the
simplest positive samples in triplet for weakly supervised training, which
limits the ability of the network to recognize harder positive pairs. To
address this issue, we propose a Semi-hard Positive Sample Mining (ShPSM)
strategy to select appropriate hard positive images for training more robust
VPR networks. Extensive experiments on four benchmark VPR datasets show that
the proposed AANet can outperform several state-of-the-art methods with less
time consumption. The code is released at https://github.com/Lu-Feng/AANet.
Related papers
- Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - VICRegL: Self-Supervised Learning of Local Visual Features [34.92750644059916]
This paper explores the fundamental trade-off between learning local and global features.
A new method called VICRegL is proposed that learns good global and local features simultaneously.
We demonstrate strong performance on linear classification and segmentation transfer tasks.
arXiv Detail & Related papers (2022-10-04T12:54:25Z) - Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data [16.540900776820084]
We propose a novel framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
Our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity.
arXiv Detail & Related papers (2022-08-19T12:59:46Z) - Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical
Place Recognition [0.09558392439655011]
We propose a tightly coupled learning (TCL) strategy to train triplet models.
It combines global and local descriptors for joint optimization.
Our lightweight unified model is better than several state-of-the-art methods.
arXiv Detail & Related papers (2022-02-14T03:20:39Z) - Efficient Person Search: An Anchor-Free Approach [86.45858994806471]
Person search aims to simultaneously localize and identify a query person from realistic, uncropped images.
To achieve this goal, state-of-the-art models typically add a re-id branch upon two-stage detectors like Faster R-CNN.
In this work, we present an anchor-free approach to efficiently tackling this challenging task, by introducing the following dedicated designs.
arXiv Detail & Related papers (2021-09-01T07:01:33Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - STA-VPR: Spatio-temporal Alignment for Visual Place Recognition [17.212503755962757]
We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images.
A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment.
The results show that the proposed method significantly improves the CNN-based methods.
arXiv Detail & Related papers (2021-03-25T03:27:42Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - MRDet: A Multi-Head Network for Accurate Oriented Object Detection in
Aerial Images [51.227489316673484]
We propose an arbitrary-oriented region proposal network (AO-RPN) to generate oriented proposals transformed from horizontal anchors.
To obtain accurate bounding boxes, we decouple the detection task into multiple subtasks and propose a multi-head network.
Each head is specially designed to learn the features optimal for the corresponding task, which allows our network to detect objects accurately.
arXiv Detail & Related papers (2020-12-24T06:36:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.