Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding
- URL: http://arxiv.org/abs/2202.13093v2
- Date: Tue, 1 Mar 2022 12:47:45 GMT
- Title: Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding
- Authors: Rui Cao, Yihao Wang, Yuxin Liang, Ling Gao, Jie Zheng, Jie Ren, Zheng
Wang
- Abstract summary: We present a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE.
We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples.
Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue.
- Score: 14.295787044482136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning is emerging as a powerful technique for extracting
knowledge from unlabeled data. This technique requires a balanced mixture of
two ingredients: positive (similar) and negative (dissimilar) samples. This is
typically achieved by maintaining a queue of negative samples during training.
Prior works in the area typically uses a fixed-length negative sample queue,
but how the negative sample size affects the model performance remains unclear.
The opaque impact of the number of negative samples on performance when
employing contrastive learning aroused our in-depth exploration. This paper
presents a momentum contrastive learning model with negative sample queue for
sentence embedding, namely MoCoSE. We add the prediction layer to the online
branch to make the model asymmetric and together with EMA update mechanism of
the target branch to prevent model from collapsing. We define a maximum
traceable distance metric, through which we learn to what extent the text
contrastive learning benefits from the historical information of negative
samples. Our experiments find that the best results are obtained when the
maximum traceable distance is at a certain range, demonstrating that there is
an optimal range of historical information for a negative sample queue. We
evaluate the proposed unsupervised MoCoSE on the semantic text similarity (STS)
task and obtain an average Spearman's correlation of $77.27\%$. Source code is
available at https://github.com/xbdxwyh/mocose
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Graph Ranking Contrastive Learning: A Extremely Simple yet Efficient Method [17.760628718072144]
InfoNCE uses augmentation techniques to obtain two views, where a node in one view acts as the anchor, the corresponding node in the other view serves as the positive sample, and all other nodes are regarded as negative samples.
The goal is to minimize the distance between the anchor node and positive samples and maximize the distance to negative samples.
Due to the lack of label information during training, InfoNCE inevitably treats samples from the same class as negative samples, leading to the issue of false negative samples.
We propose GraphRank, a simple yet efficient graph contrastive learning method that addresses the problem of false negative samples
arXiv Detail & Related papers (2023-10-23T03:15:57Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with
Soft Negative Samples [36.08601841321196]
We propose contrastive learning for unsupervised sentence embedding with soft negative samples.
We show that SNCSE can obtain state-of-the-art performance on semantic textual similarity task.
arXiv Detail & Related papers (2022-01-16T06:15:43Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z) - Contrastive Learning with Hard Negative Samples [80.12117639845678]
We develop a new family of unsupervised sampling methods for selecting hard negative samples.
A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible.
The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
arXiv Detail & Related papers (2020-10-09T14:18:53Z) - SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner.
A key of success to such contrastive learning methods is how to draw positive and negative samples.
In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.