Active Mining Sample Pair Semantics for Image-text Matching
- URL: http://arxiv.org/abs/2311.05425v1
- Date: Thu, 9 Nov 2023 15:03:57 GMT
- Title: Active Mining Sample Pair Semantics for Image-text Matching
- Authors: Yongfeng Chena, Jin Liua, Zhijing Yang, Ruihan Chena, Junpeng Tan
- Abstract summary: This paper proposes a novel image-text matching model, called Active Mining Sample Pair Semantics image-text matching model (AMSPS)
Compared with the single semantic learning mode of the commonsense learning model with triplet loss function, AMSPS is an active learning idea.
- Score: 6.370886833310617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, commonsense learning has been a hot topic in image-text matching.
Although it can describe more graphic correlations, commonsense learning still
has some shortcomings: 1) The existing methods are based on triplet semantic
similarity measurement loss, which cannot effectively match the intractable
negative in image-text sample pairs. 2) The weak generalization ability of the
model leads to the poor effect of image and text matching on large-scale
datasets. According to these shortcomings. This paper proposes a novel
image-text matching model, called Active Mining Sample Pair Semantics
image-text matching model (AMSPS). Compared with the single semantic learning
mode of the commonsense learning model with triplet loss function, AMSPS is an
active learning idea. Firstly, the proposed Adaptive Hierarchical Reinforcement
Loss (AHRL) has diversified learning modes. Its active learning mode enables
the model to more focus on the intractable negative samples to enhance the
discriminating ability. In addition, AMSPS can also adaptively mine more hidden
relevant semantic representations from uncommented items, which greatly
improves the performance and generalization ability of the model. Experimental
results on Flickr30K and MSCOCO universal datasets show that our proposed
method is superior to advanced comparison methods.
Related papers
- Active Learning for Finely-Categorized Image-Text Retrieval by Selecting Hard Negative Unpaired Samples [7.883521157895832]
Securing sufficient amount of paired data is important to train an image-text retrieval (ITR) model.
We propose an active learning algorithm for ITR that can collect paired data cost-efficiently.
We validate the effectiveness of the proposed method on Flickr30K and MS-COCO datasets.
arXiv Detail & Related papers (2024-05-25T16:50:33Z) - FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark.
FineMatch focuses on text and image mismatch detection and correction.
We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z) - Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial
Margin Contrastive Learning [35.404100473539195]
Text-video retrieval aims to rank relevant text/video higher than irrelevant ones.
Recent contrastive learning methods have shown promising results for text-video retrieval.
This paper improves contrastive learning using two novel techniques.
arXiv Detail & Related papers (2023-09-20T06:08:11Z) - ALIP: Adaptive Language-Image Pre-training with Synthetic Caption [78.93535202851278]
Contrastive Language-Image Pre-training (CLIP) has significantly boosted the performance of various vision-language tasks.
The presence of intrinsic noise and unmatched image-text pairs in web data can potentially affect the performance of representation learning.
We propose an Adaptive Language-Image Pre-training (ALIP), a bi-path model that integrates supervision from both raw text and synthetic caption.
arXiv Detail & Related papers (2023-08-16T15:19:52Z) - Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic [72.60554897161948]
Recent text-to-image matching models apply contrastive learning to large corpora of uncurated pairs of images and sentences.
In this work, we repurpose such models to generate a descriptive text given an image at inference time.
The resulting captions are much less restrictive than those obtained by supervised captioning methods.
arXiv Detail & Related papers (2021-11-29T11:01:49Z) - Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences
for Image-Text Retrieval [19.161248757493386]
We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples.
To keep the difficulty during training, we mutually improve the retrieval and generation through parameter sharing.
In experiments, we verify the effectiveness of our model on MS-COCO and Flickr30K compared with current state-of-the-art models.
arXiv Detail & Related papers (2021-11-05T09:36:41Z) - Contrastive Learning of Visual-Semantic Embeddings [4.7464518249313805]
We propose two loss functions based on normalized cross-entropy to perform the task of learning joint visual-semantic embedding.
We compare our results with existing visual-semantic embedding methods on cross-modal image-to-text and text-to-image retrieval tasks.
arXiv Detail & Related papers (2021-10-17T17:28:04Z) - Delving into Inter-Image Invariance for Unsupervised Visual
Representations [108.33534231219464]
We present a study to better understand the role of inter-image invariance learning.
Online labels converge faster than offline labels.
Semi-hard negative samples are more reliable and unbiased than hard negative samples.
arXiv Detail & Related papers (2020-08-26T17:44:23Z) - Adaptive Offline Quintuplet Loss for Image-Text Matching [102.50814151323965]
Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model.
We propose solutions by sampling negatives offline from the whole training set.
We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets.
arXiv Detail & Related papers (2020-03-07T22:09:11Z) - Learning to Compare Relation: Semantic Alignment for Few-Shot Learning [48.463122399494175]
We present a novel semantic alignment model to compare relations, which is robust to content misalignment.
We conduct extensive experiments on several few-shot learning datasets.
arXiv Detail & Related papers (2020-02-29T08:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.