Related papers: EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition

EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition

URL: http://arxiv.org/abs/2506.13133v1
Date: Mon, 16 Jun 2025 06:40:12 GMT
Title: EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition
Authors: Bingxi Liu, Hao Chen, Shiyi Guo, Yihong Wu, Jinqiang Cui, Hong Zhang,
Abstract summary: Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision.<n>We propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints.
Score: 9.75969669445091
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision in which re-ranking based on local features is commonly employed to improve performance. In robotics, VPR is also referred to as Loop Closure Detection, which emphasizes spatial-temporal verification within a sequence. However, designing local features specifically for VPR is impractical, and relying on motion sequences imposes limitations. Inspired by these observations, we propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints. First, we analyze the practical feasibility of embodied constraints in VPR and categorize them according to existing datasets, which include GPS tags, sequential timestamps, local feature matching, and self-similarity matrices. We then propose a learning-based MoF weight-computation approach, utilizing a multi-metric loss function. Experiments demonstrate that our method improves the state-of-the-art (SOTA) performance on public datasets with minimal additional computational overhead. For instance, with only 25 KB of additional parameters and a processing time of 10 microseconds per frame, our method achieves a 0.9\% improvement over a DINOv2-based baseline performance on the Pitts-30k test set.

Related papers

AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z)
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance.<n>We propose a novel method to realize seamless adaptation of foundation models to VPR.<n>In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition [23.173085268845384]
This paper introduces VLAD-BuFF, a self-similarity based feature discounting mechanism to learn Burst-aware features within end-to-end VPR training. We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art. Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall.
arXiv Detail & Related papers (2024-09-28T09:44:08Z)
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We present an effective approach to harness the potential of a foundation model for Visual Place Recognition.<n>We show that features extracted from self-attention layers can act as a powerful re-ranker for VPR, even in a zero-shot setting.<n>Our method also demonstrates exceptional robustness and generalization, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-05-28T11:24:41Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR) Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z)
AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module. We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z)
MixVPR: Feature Mixing for Visual Place Recognition [3.6739949215165164]
Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving. We introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks.
arXiv Detail & Related papers (2023-03-03T19:24:03Z)
Self-Supervised Place Recognition by Refining Temporal and Featural Pseudo Labels from Panoramic Data [16.540900776820084]
We propose a novel framework named TF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods. Our method outperforms self-supervised baselines in recall rate, robustness, and heading diversity.
arXiv Detail & Related papers (2022-08-19T12:59:46Z)
STA-VPR: Spatio-temporal Alignment for Visual Place Recognition [17.212503755962757]
We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images. A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment. The results show that the proposed method significantly improves the CNN-based methods.
arXiv Detail & Related papers (2021-03-25T03:27:42Z)
Early Bird: Loop Closures from Opposing Viewpoints for Perceptually-Aliased Indoor Environments [35.663671249819124]
We present novel research that simultaneously addresses viewpoint change and perceptual aliasing. We show that our integration of VPR with SLAM significantly boosts the performance of VPR, feature correspondence, and pose graph submodules. For the first time, we demonstrate a localization system capable of state-of-the-art performance despite perceptual aliasing and extreme 180-degree-rotated viewpoint change.
arXiv Detail & Related papers (2020-10-03T20:18:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.