Related papers: Breaking the Frame: Image Retrieval by Visual Overlap Prediction

Breaking the Frame: Image Retrieval by Visual Overlap Prediction

URL: http://arxiv.org/abs/2406.16204v1
Date: Sun, 23 Jun 2024 20:00:20 GMT
Title: Breaking the Frame: Image Retrieval by Visual Overlap Prediction
Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath,
Abstract summary: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching.
Score: 53.17564423756082
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level embeddings by a Vision Transformer backbone and establishing patch-to-patch correspondences, our approach uses a voting mechanism to assess overlap scores for potential database images, thereby providing a nuanced image retrieval metric in challenging scenarios. VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world datasets. The code is available at https://github.com/weitong8591/vop.

Related papers

Context-Based Visual-Language Place Recognition [4.737519767218666]
A popular approach to vision-based place recognition relies on low-level visual features. We introduce a novel VPR approach that remains robust to scene changes and does not require additional training. Our method constructs semantic image descriptors by extracting pixel-level embeddings using a zero-shot, language-driven semantic segmentation model.
arXiv Detail & Related papers (2024-10-25T06:59:11Z)
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues [47.213906345208315]
We propose BRIDGE, a new learnable and reference-free image captioning metric. Our proposal achieves state-of-the-art results compared to existing reference-free evaluation scores.
arXiv Detail & Related papers (2024-07-29T18:00:17Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
Data-efficient Large Scale Place Recognition with Graded Similarity Supervision [10.117451511942267]
Visual place recognition (VPR) is a fundamental task of computer vision for visual localization. Existing methods are trained using image pairs that either depict the same place or not. We deploy an automatic re-annotation strategy to re-label VPR datasets. We propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks.
arXiv Detail & Related papers (2023-03-21T10:56:57Z)
BEVPlace: Learning LiDAR-based Place Recognition using Bird's Eye View Images [20.30997801125592]
We explore the potential of a different representation in place recognition, i.e. bird's eye view (BEV) images. A simple VGGNet trained on BEV images achieves comparable performance with the state-of-the-art place recognition methods in scenes of slight viewpoint changes. We develop a method to estimate the position of the query cloud, extending the usage of place recognition.
arXiv Detail & Related papers (2023-02-28T05:37:45Z)
Generalizable Person Re-Identification via Viewpoint Alignment and Fusion [74.30861504619851]
This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images. Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images. We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
arXiv Detail & Related papers (2022-12-05T16:24:09Z)
Investigating the Role of Image Retrieval for Visual Localization -- An exhaustive benchmark [46.166955777187816]
This paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. We introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets. Using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance.
arXiv Detail & Related papers (2022-05-31T12:59:01Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision. We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.