Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?
- URL: http://arxiv.org/abs/2202.07474v1
- Date: Mon, 14 Feb 2022 15:18:00 GMT
- Title: Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?
- Authors: Maurits Bleeker and Maarten de Rijke
- Abstract summary: The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
- Score: 67.45267657995748
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The triplet loss with semi-hard negatives has become the de facto choice for
image-caption retrieval (ICR) methods that are optimized from scratch. Recent
progress in metric learning has given rise to new loss functions that
outperform the triplet loss on tasks such as image retrieval and representation
learning. We ask whether these findings generalize to the setting of ICR by
comparing three loss functions on two ICR methods. We answer this question
negatively: the triplet loss with semi-hard negative mining still outperforms
newly introduced loss functions from metric learning on the ICR task. To gain a
better understanding of these outcomes, we introduce an analysis method to
compare loss functions by counting how many samples contribute to the gradient
w.r.t. the query representation during optimization. We find that loss
functions that result in lower evaluation scores on the ICR task, in general,
take too many (non-informative) samples into account when computing a gradient
w.r.t. the query representation, which results in sub-optimal performance. The
triplet loss with semi-hard negatives is shown to outperform the other loss
functions, as it only takes one (hard) negative into account when computing the
gradient.
Related papers
- Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - Tuned Contrastive Learning [77.67209954169593]
We propose a novel contrastive loss function -- Tuned Contrastive Learning (TCL) loss.
TCL generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives.
We show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods.
arXiv Detail & Related papers (2023-05-18T03:26:37Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Dissecting the impact of different loss functions with gradient surgery [7.001832294837659]
Pair-wise loss is an approach to metric learning that learns a semantic embedding by optimizing a loss function.
Here we decompose the gradient of these loss functions into components that relate to how they push the relative feature positions of the anchor-positive and anchor-negative pairs.
arXiv Detail & Related papers (2022-01-27T03:55:48Z) - Class Interference Regularization [7.248447600071719]
Contrastive losses yield state-of-the-art performance for person re-identification, face verification and few shot learning.
We propose a novel, simple and effective regularization technique, the Class Interference Regularization (CIR)
CIR perturbs the output features by randomly moving them towards the average embeddings of the negative classes.
arXiv Detail & Related papers (2020-09-04T21:03:32Z) - Learning Condition Invariant Features for Retrieval-Based Localization
from 1M Images [85.81073893916414]
We develop a novel method for learning more accurate and better generalizing localization features.
On the challenging Oxford RobotCar night condition, our method outperforms the well-known triplet loss by 24.4% in localization accuracy within 5m.
arXiv Detail & Related papers (2020-08-27T14:46:22Z) - A Unified Framework of Surrogate Loss by Refactoring and Interpolation [65.60014616444623]
We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent.
We validate the effectiveness of UniLoss on three tasks and four datasets.
arXiv Detail & Related papers (2020-07-27T21:16:51Z) - An Equivalence between Loss Functions and Non-Uniform Sampling in
Experience Replay [72.23433407017558]
We show that any loss function evaluated with non-uniformly sampled data can be transformed into another uniformly sampled loss function.
Surprisingly, we find in some environments PER can be replaced entirely by this new loss function without impact to empirical performance.
arXiv Detail & Related papers (2020-07-12T17:45:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.