Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding
- URL: http://arxiv.org/abs/2202.13093v2
- Date: Tue, 1 Mar 2022 12:47:45 GMT
- Title: Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding
- Authors: Rui Cao, Yihao Wang, Yuxin Liang, Ling Gao, Jie Zheng, Jie Ren, Zheng
Wang
- Abstract summary: We present a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE.
We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples.
Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue.
- Score: 14.295787044482136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning is emerging as a powerful technique for extracting
knowledge from unlabeled data. This technique requires a balanced mixture of
two ingredients: positive (similar) and negative (dissimilar) samples. This is
typically achieved by maintaining a queue of negative samples during training.
Prior works in the area typically uses a fixed-length negative sample queue,
but how the negative sample size affects the model performance remains unclear.
The opaque impact of the number of negative samples on performance when
employing contrastive learning aroused our in-depth exploration. This paper
presents a momentum contrastive learning model with negative sample queue for
sentence embedding, namely MoCoSE. We add the prediction layer to the online
branch to make the model asymmetric and together with EMA update mechanism of
the target branch to prevent model from collapsing. We define a maximum
traceable distance metric, through which we learn to what extent the text
contrastive learning benefits from the historical information of negative
samples. Our experiments find that the best results are obtained when the
maximum traceable distance is at a certain range, demonstrating that there is
an optimal range of historical information for a negative sample queue. We
evaluate the proposed unsupervised MoCoSE on the semantic text similarity (STS)
task and obtain an average Spearman's correlation of $77.27\%$. Source code is
available at https://github.com/xbdxwyh/mocose
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.