Unified Loss of Pair Similarity Optimization for Vision-Language
Retrieval
- URL: http://arxiv.org/abs/2209.13869v1
- Date: Wed, 28 Sep 2022 07:01:22 GMT
- Title: Unified Loss of Pair Similarity Optimization for Vision-Language
Retrieval
- Authors: Zheng Li, Caili Guo, Xin Wang, Zerun Feng, Jenq-Neng Hwang, Zhongtian
Du
- Abstract summary: There are two popular loss functions used for vision-language retrieval, i.e., triplet loss and contrastive learning loss.
This paper proposes a unified loss of pair similarity optimization for vision-language retrieval.
- Score: 35.141916376979836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are two popular loss functions used for vision-language retrieval,
i.e., triplet loss and contrastive learning loss, both of them essentially
minimize the difference between the similarities of negative pairs and positive
pairs. More specifically, Triplet loss with Hard Negative mining (Triplet-HN),
which is widely used in existing retrieval models to improve the discriminative
ability, is easy to fall into local minima in training. On the other hand,
Vision-Language Contrastive learning loss (VLC), which is widely used in the
vision-language pre-training, has been shown to achieve significant performance
gains on vision-language retrieval, but the performance of fine-tuning with VLC
on small datasets is not satisfactory. This paper proposes a unified loss of
pair similarity optimization for vision-language retrieval, providing a
powerful tool for understanding existing loss functions. Our unified loss
includes the hard sample mining strategy of VLC and introduces the margin used
by the triplet loss for better similarity separation. It is shown that both
Triplet-HN and VLC are special forms of our unified loss. Compared with the
Triplet-HN, our unified loss has a fast convergence speed. Compared with the
VLC, our unified loss is more discriminative and can provide better
generalization in downstream fine-tuning tasks. Experiments on image-text and
video-text retrieval benchmarks show that our unified loss can significantly
improve the performance of the state-of-the-art retrieval models.
Related papers
- Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - Expressive Losses for Verified Robustness via Convex Combinations [67.54357965665676]
We study the relationship between the over-approximation coefficient and performance profiles across different expressive losses.
We show that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.
arXiv Detail & Related papers (2023-05-23T12:20:29Z) - Tuned Contrastive Learning [77.67209954169593]
We propose a novel contrastive loss function -- Tuned Contrastive Learning (TCL) loss.
TCL generalizes to multiple positives and negatives in a batch and offers parameters to tune and improve the gradient responses from hard positives and hard negatives.
We show how to extend TCL to self-supervised setting and empirically compare it with various SOTA self-supervised learning methods.
arXiv Detail & Related papers (2023-05-18T03:26:37Z) - SuSana Distancia is all you need: Enforcing class separability in metric
learning via two novel distance-based loss functions for few-shot image
classification [0.9236074230806579]
We propose two loss functions which consider the importance of the embedding vectors by looking at the intra-class and inter-class distance between the few data.
Our results show a significant improvement in accuracy in the miniImagenNet benchmark compared to other metric-based few-shot learning methods by a margin of 2%.
arXiv Detail & Related papers (2023-05-15T23:12:09Z) - Adaptive Sparse Pairwise Loss for Object Re-Identification [25.515107212575636]
Pairwise losses play an important role in training a strong ReID network.
We propose a novel loss paradigm termed Sparse Pairwise (SP) loss.
We show that SP loss and its adaptive variant AdaSP loss outperform other pairwise losses.
arXiv Detail & Related papers (2023-03-31T17:59:44Z) - Benchmarking Deep AUROC Optimization: Loss Functions and Algorithmic
Choices [37.559461866831754]
We benchmark a variety of loss functions with different algorithmic choices for deep AUROC optimization problem.
We highlight the essential choices such as positive sampling rate, regularization, normalization/activation, and weights.
Our findings show that although Adam-type method is more competitive from training perspective, but it does not outperform others from testing perspective.
arXiv Detail & Related papers (2022-03-27T00:47:00Z) - Do Lessons from Metric Learning Generalize to Image-Caption Retrieval? [67.45267657995748]
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch.
Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning.
We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods.
arXiv Detail & Related papers (2022-02-14T15:18:00Z) - Robust Contrastive Learning against Noisy Views [79.71880076439297]
We propose a new contrastive loss function that is robust against noisy views.
We show that our approach provides consistent improvements over the state-of-the-art image, video, and graph contrastive learning benchmarks.
arXiv Detail & Related papers (2022-01-12T05:24:29Z) - A Decidability-Based Loss Function [2.5919311269669003]
Biometric problems often use deep learning models to extract features from images, also known as embeddings.
In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine.
The proposed approach is compared against the Softmax (cross-entropy), Triplets Soft-Hard, and the Multi Similarity losses in four different benchmarks.
arXiv Detail & Related papers (2021-09-12T14:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.