Related papers: HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition

HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition

URL: http://arxiv.org/abs/2506.04764v1
Date: Thu, 05 Jun 2025 08:47:15 GMT
Title: HypeVPR: Exploring Hyperbolic Space for Perspective to Equirectangular Visual Place Recognition
Authors: Suhan Woo, Seongwon Lee, Jinwoo Jang, Euntai Kim,
Abstract summary: We introduce HypeVPR, a novel hierarchical embedding framework in hyperbolic space.<n>HypeVPR is designed to address the unique challenges of perspective-to-equirectangular (P2E) VPR.
Score: 16.46501527058266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When applying Visual Place Recognition (VPR) to real-world mobile robots and similar applications, perspective-to-equirectangular (P2E) formulation naturally emerges as a suitable approach to accommodate diverse query images captured from various viewpoints. In this paper, we introduce HypeVPR, a novel hierarchical embedding framework in hyperbolic space, designed to address the unique challenges of P2E VPR. The key idea behind HypeVPR is that visual environments captured by panoramic views exhibit inherent hierarchical structures. To leverage this property, we employ hyperbolic space to represent hierarchical feature relationships and preserve distance properties within the feature space. To achieve this, we propose a hierarchical feature aggregation mechanism that organizes local-to-global feature representations within hyperbolic space. Additionally, HypeVPR adopts an efficient coarse-to-fine search strategy, optimally balancing speed and accuracy to ensure robust matching, even between descriptors from different image types. This approach enables HypeVPR to outperform state-of-the-art methods while significantly reducing retrieval time, achieving up to 5x faster retrieval across diverse benchmark datasets. The code and models will be released at https://github.com/suhan-woo/HypeVPR.git.

Related papers

EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition [9.75969669445091]
Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision.<n>We propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints.
arXiv Detail & Related papers (2025-06-16T06:40:12Z)
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding [64.29499221878746]
Vision-language Models (VLMs) have shown remarkable capabilities in advancing general artificial intelligence.<n>PyPE is a novel approach designed to enhance the perception of visual tokens withinVLMs.<n>Our method reduces the relative distance between interrelated visual elements and instruction tokens.
arXiv Detail & Related papers (2025-01-19T07:00:46Z)
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition [34.875097011568336]
Visual place recognition (VPR) aims to determine the general geographical location of a query image.<n>We propose an Efficient Decoder Transformer (EDTformer) for feature aggregation.<n>Our EDTformer can fully utilize the contextual information within deep features.
arXiv Detail & Related papers (2024-12-01T12:14:36Z)
Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image. We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module. We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z)
AnyLoc: Towards Universal Visual Place Recognition [12.892386791383025]
Visual Place Recognition (VPR) is vital for robot localization. Most performant VPR approaches are environment- and task-specific. We develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments.
arXiv Detail & Related papers (2023-08-01T17:45:13Z)
MixVPR: Feature Mixing for Visual Place Recognition [3.6739949215165164]
Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving. We introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks.
arXiv Detail & Related papers (2023-03-03T19:24:03Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
Dynamic Prototype Mask for Occluded Person Re-Identification [88.7782299372656]
Existing methods mainly address this issue by employing body clues provided by an extra network to distinguish the visible part. We propose a novel Dynamic Prototype Mask (DPM) based on two self-evident prior knowledge. Under this condition, the occluded representation could be well aligned in a selected subspace spontaneously.
arXiv Detail & Related papers (2022-07-19T03:31:13Z)
STA-VPR: Spatio-temporal Alignment for Visual Place Recognition [17.212503755962757]
We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images. A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment. The results show that the proposed method significantly improves the CNN-based methods.
arXiv Detail & Related papers (2021-03-25T03:27:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.