Adaptive Offline Quintuplet Loss for Image-Text Matching
- URL: http://arxiv.org/abs/2003.03669v3
- Date: Wed, 22 Jul 2020 14:58:18 GMT
- Title: Adaptive Offline Quintuplet Loss for Image-Text Matching
- Authors: Tianlang Chen, Jiajun Deng, Jiebo Luo
- Abstract summary: Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model.
We propose solutions by sampling negatives offline from the whole training set.
We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets.
- Score: 102.50814151323965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing image-text matching approaches typically leverage triplet loss with
online hard negatives to train the model. For each image or text anchor in a
training mini-batch, the model is trained to distinguish between a positive and
the most confusing negative of the anchor mined from the mini-batch (i.e.
online hard negative). This strategy improves the model's capacity to discover
fine-grained correspondences and non-correspondences between image and text
inputs. However, the above approach has the following drawbacks: (1) the
negative selection strategy still provides limited chances for the model to
learn from very hard-to-distinguish cases. (2) The trained model has weak
generalization capability from the training set to the testing set. (3) The
penalty lacks hierarchy and adaptiveness for hard negatives with different
"hardness" degrees. In this paper, we propose solutions by sampling negatives
offline from the whole training set. It provides "harder" offline negatives
than online hard negatives for the model to distinguish. Based on the offline
hard negatives, a quintuplet loss is proposed to improve the model's
generalization capability to distinguish positives and negatives. In addition,
a novel loss function that combines the knowledge of positives, offline hard
negatives and online hard negatives is created. It leverages offline hard
negatives as the intermediary to adaptively penalize them based on their
distance relations to the anchor. We evaluate the proposed training approach on
three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets.
Significant performance improvements are observed for all the models, proving
the effectiveness and generality of our approach. Code is available at
https://github.com/sunnychencool/AOQ
Related papers
- Conan-embedding: General Text Embedding with More and Better Negative Samples [30.571206231457932]
We propose a conan-embedding model, which maximizes the utilization of more and higher-quality negative examples.
Our approach effectively enhances the capabilities of embedding models, currently ranking first on the Chinese leaderboard of Massive text embedding benchmark.
arXiv Detail & Related papers (2024-08-28T11:18:06Z) - Active Mining Sample Pair Semantics for Image-text Matching [6.370886833310617]
This paper proposes a novel image-text matching model, called Active Mining Sample Pair Semantics image-text matching model (AMSPS)
Compared with the single semantic learning mode of the commonsense learning model with triplet loss function, AMSPS is an active learning idea.
arXiv Detail & Related papers (2023-11-09T15:03:57Z) - Enhancing Multimodal Compositional Reasoning of Visual Language Models
with Generative Negative Mining [58.379339799777064]
Large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks.
We propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities.
Our code and dataset are released at https://ugorsahin.github.io/enhancing-multimodal-compositional-reasoning-of-vlm.html.
arXiv Detail & Related papers (2023-11-07T13:05:47Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences
for Image-Text Retrieval [19.161248757493386]
We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples.
To keep the difficulty during training, we mutually improve the retrieval and generation through parameter sharing.
In experiments, we verify the effectiveness of our model on MS-COCO and Flickr30K compared with current state-of-the-art models.
arXiv Detail & Related papers (2021-11-05T09:36:41Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Contrastive Learning with Hard Negative Samples [80.12117639845678]
We develop a new family of unsupervised sampling methods for selecting hard negative samples.
A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible.
The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
arXiv Detail & Related papers (2020-10-09T14:18:53Z) - SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner.
A key of success to such contrastive learning methods is how to draw positive and negative samples.
In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.