Breaking the Frame: Image Retrieval by Visual Overlap Prediction
- URL: http://arxiv.org/abs/2406.16204v1
- Date: Sun, 23 Jun 2024 20:00:20 GMT
- Title: Breaking the Frame: Image Retrieval by Visual Overlap Prediction
- Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath,
- Abstract summary: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes.
The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching.
- Score: 53.17564423756082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level embeddings by a Vision Transformer backbone and establishing patch-to-patch correspondences, our approach uses a voting mechanism to assess overlap scores for potential database images, thereby providing a nuanced image retrieval metric in challenging scenarios. VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world datasets. The code is available at https://github.com/weitong8591/vop.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion [74.30861504619851]
This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images.
Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images.
We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
arXiv Detail & Related papers (2022-12-05T16:24:09Z) - End-to-end learning of keypoint detection and matching for relative pose
estimation [1.8352113484137624]
We propose a new method for estimating the relative pose between two images.
We jointly learn keypoint detection, description extraction, matching and robust pose estimation.
We demonstrate our method for the task of visual localization of a query image within a database of images with known pose.
arXiv Detail & Related papers (2021-04-02T15:16:17Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.