SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
- URL: http://arxiv.org/abs/2502.16601v1
- Date: Sun, 23 Feb 2025 15:01:09 GMT
- Title: SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
- Authors: Feng Lu, Tong Jin, Xiangyuan Lan, Lijun Zhang, Yunpeng Liu, Yaowei Wang, Chun Yuan,
- Abstract summary: Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance.<n>We propose a novel method to realize seamless adaptation of foundation models to VPR.<n>In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
- Score: 69.58329995485158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. In our previous work, we propose a novel method to realize seamless adaptation of foundation models to VPR (SelaVPR). This method can produce both global and local features that focus on discriminative landmarks to recognize places for two-stage VPR by a parameter-efficient adaptation approach. Although SelaVPR has achieved competitive results, we argue that the previous adaptation is inefficient in training time and GPU memory usage, and the re-ranking paradigm is also costly in retrieval latency and storage usage. In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++. Concretely, we first design a parameter-, time-, and memory-efficient adaptation method that uses lightweight multi-scale convolution (MultiConv) adapters to refine intermediate features from the frozen foundation backbone. This adaptation method does not back-propagate gradients through the backbone during training, and the MultiConv adapter facilitates feature interactions along the spatial axes and introduces proper local priors, thus achieving higher efficiency and better performance. Moreover, we propose an innovative re-ranking paradigm for more efficient VPR. Instead of relying on local features for re-ranking, which incurs huge overhead in latency and storage, we employ compact binary features for initial retrieval and robust floating-point (global) features for re-ranking. To obtain such binary features, we propose a similarity-constrained deep hashing method, which can be easily integrated into the VPR pipeline. Finally, we improve our training strategy and unify the training protocol of several common training datasets to merge them for better training of VPR models. Extensive experiments show that ......
Related papers
- ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition [22.615830919860777]
This paper presents an efficient visual recognition paradigm, called Dynamic Adapter (Dyn-Adapter)
We devise a dynamic architecture with balanced early heads for multi-level feature extraction, along with adaptive training strategy.
We reduce FLOPs during inference by 50%, while maintaining or even yielding higher recognition accuracy.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation [64.34935748707673]
Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors.
We propose a novel method of Learning Resampling (termed LeRF) which takes advantage of both the structural priors learned by DNNs and the locally continuous assumption.
LeRF assigns spatially varying resampling functions to input image pixels and learns to predict the shapes of these resampling functions with a neural network.
arXiv Detail & Related papers (2024-07-13T16:09:45Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We present an effective approach to harness the potential of a foundation model for Visual Place Recognition.<n>We show that features extracted from self-attention layers can act as a powerful re-ranker for VPR, even in a zero-shot setting.<n>Our method also demonstrates exceptional robustness and generalization, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation [61.57833648734164]
We propose a novel Parallel Yielding Re-Activation (PYRA) method for training-inference efficient task adaptation.
PYRA outperforms all competing methods under both low compression rate and high compression rate.
arXiv Detail & Related papers (2024-03-14T09:06:49Z) - Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR)
Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method.
Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Residual Local Feature Network for Efficient Super-Resolution [20.62809970985125]
In this work, we propose a novel Residual Local Feature Network (RLFN)
The main idea is using three convolutional layers for residual local feature learning to simplify feature aggregation.
In addition, we won the first place in the runtime track of the NTIRE 2022 efficient super-resolution challenge.
arXiv Detail & Related papers (2022-05-16T08:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.