Improving Calibration in Deep Metric Learning With Cross-Example Softmax
- URL: http://arxiv.org/abs/2011.08824v1
- Date: Tue, 17 Nov 2020 18:47:28 GMT
- Title: Improving Calibration in Deep Metric Learning With Cross-Example Softmax
- Authors: Andreas Veit, Kimberly Wilber
- Abstract summary: We propose Cross-Example Softmax which combines the properties of top-$k$ and threshold relevancy.
In each iteration, the proposed loss encourages all queries to be closer to their matching images than all queries are to all non-matching images.
This leads to a globally more calibrated similarity metric and makes distance more interpretable as an absolute measure of relevance.
- Score: 11.014197662964335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern image retrieval systems increasingly rely on the use of deep neural
networks to learn embedding spaces in which distance encodes the relevance
between a given query and image. In this setting, existing approaches tend to
emphasize one of two properties. Triplet-based methods capture top-$k$
relevancy, where all top-$k$ scoring documents are assumed to be relevant to a
given query Pairwise contrastive models capture threshold relevancy, where all
documents scoring higher than some threshold are assumed to be relevant. In
this paper, we propose Cross-Example Softmax which combines the properties of
top-$k$ and threshold relevancy. In each iteration, the proposed loss
encourages all queries to be closer to their matching images than all queries
are to all non-matching images. This leads to a globally more calibrated
similarity metric and makes distance more interpretable as an absolute measure
of relevance. We further introduce Cross-Example Negative Mining, in which each
pair is compared to the hardest negative comparisons across the entire batch.
Empirically, we show in a series of experiments on Conceptual Captions and
Flickr30k, that the proposed method effectively improves global calibration and
also retrieval performance.
Related papers
- HomoMatcher: Dense Feature Matching Results with Semi-Dense Efficiency by Homography Estimation [39.48940223810725]
Feature matching between image pairs is a fundamental problem in computer vision that drives many applications, such as SLAM.
This paper concentrates on enhancing the fine-matching module in the semi-dense matching framework.
We employ a lightweight and efficient homography estimation network to generate the perspective mapping between patches obtained from coarse matching.
arXiv Detail & Related papers (2024-11-11T04:05:12Z) - PRISM: PRogressive dependency maxImization for Scale-invariant image Matching [4.9521269535586185]
We propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM)
Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks.
arXiv Detail & Related papers (2024-08-07T07:35:17Z) - Handbook on Leveraging Lines for Two-View Relative Pose Estimation [82.72686460985297]
We propose an approach for estimating the relative pose between image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner.
Our hybrid framework combines the advantages of all configurations, enabling robust and accurate estimation in challenging environments.
arXiv Detail & Related papers (2023-09-27T21:43:04Z) - Quantity-Aware Coarse-to-Fine Correspondence for Image-to-Point Cloud
Registration [4.954184310509112]
Image-to-point cloud registration aims to determine the relative camera pose between an RGB image and a reference point cloud.
Matching individual points with pixels can be inherently ambiguous due to modality gaps.
We propose a framework to capture quantity-aware correspondences between local point sets and pixel patches.
arXiv Detail & Related papers (2023-07-14T03:55:54Z) - Hierarchical Matching and Reasoning for Multi-Query Image Retrieval [113.44470784756308]
We propose a novel Hierarchical Matching and Reasoning Network (HMRN) for Multi-Query Image Retrieval (MQIR)
It disentangles MQIR into three hierarchical semantic representations, which is responsible to capture fine-grained local details, contextual global scopes, and high-level inherent correlations.
Our HMRN substantially surpasses the current state-of-the-art methods.
arXiv Detail & Related papers (2023-06-26T07:03:56Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - COTR: Correspondence Transformer for Matching Across Images [31.995943755283786]
We propose a novel framework for finding correspondences in images based on a deep neural network.
By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
arXiv Detail & Related papers (2021-03-25T22:47:02Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.