MixVPR: Feature Mixing for Visual Place Recognition
- URL: http://arxiv.org/abs/2303.02190v1
- Date: Fri, 3 Mar 2023 19:24:03 GMT
- Title: MixVPR: Feature Mixing for Visual Place Recognition
- Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Gigu\`ere
- Abstract summary: Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving.
We introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features.
We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks.
- Score: 3.6739949215165164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) is a crucial part of mobile robotics and
autonomous driving as well as other computer vision tasks. It refers to the
process of identifying a place depicted in a query image using only computer
vision. At large scale, repetitive structures, weather and illumination changes
pose a real challenge, as appearances can drastically change over time. Along
with tackling these challenges, an efficient VPR technique must also be
practical in real-world scenarios where latency matters. To address this, we
introduce MixVPR, a new holistic feature aggregation technique that takes
feature maps from pre-trained backbones as a set of global features. Then, it
incorporates a global relationship between elements in each feature map in a
cascade of feature mixing, eliminating the need for local or pyramidal
aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of
our technique through extensive experiments on multiple large-scale benchmarks.
Our method outperforms all existing techniques by a large margin while having
less than half the number of parameters compared to CosPlace and NetVLAD. We
achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on
MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method
outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and
SuperGLUE all while being orders of magnitude faster. Our code and trained
models are available at https://github.com/amaralibey/MixVPR.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR)
Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method.
Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z) - AANet: Aggregation and Alignment Network with Semi-hard Positive Sample
Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots.
We present a unified network capable of extracting global features for retrieving candidates via an aggregation module.
We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z) - ClusVPR: Efficient Visual Place Recognition with Clustering-based
Weighted Transformer [13.0858576267115]
We present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects.
ClusVPR introduces a unique paradigm called Clustering-based weighted Transformer Network (CWTNet)
We also introduce the optimized-VLAD layer that significantly reduces the number of parameters and enhances model efficiency.
arXiv Detail & Related papers (2023-10-06T09:01:15Z) - A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In
Changing Environments [22.58641358408613]
Visual place recognition (VPR) is an essential component of robot navigation and localization systems.
No single VPR technique excels in every environmental condition.
adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC)
A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets.
arXiv Detail & Related papers (2023-03-24T19:25:22Z) - DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition [62.95223898214866]
We explore effective Vision Transformers to pursue a preferable trade-off between the computational complexity and size of the attended receptive field.
With a pyramid architecture, we construct a Multi-Scale Dilated Transformer (DilateFormer) by stacking MSDA blocks at low-level stages and global multi-head self-attention blocks at high-level stages.
Our experiment results show that our DilateFormer achieves state-of-the-art performance on various vision tasks.
arXiv Detail & Related papers (2023-02-03T14:59:31Z) - A Faster, Lighter and Stronger Deep Learning-Based Approach for Place
Recognition [7.9400442516053475]
We propose a faster, lighter and stronger approach that can generate models with fewer parameters and can spend less time in the inference stage.
We design RepVGG-lite as the backbone network in our architecture, it is more discriminative than other general networks in the Place Recognition task.
Our system has 14 times less params than Patch-NetVLAD, 6.8 times lower theoretical FLOPs, and run faster 21 and 33 times in feature extraction and feature matching.
arXiv Detail & Related papers (2022-11-27T15:46:53Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.