Smoothed Contrastive Learning for Unsupervised Sentence Embedding
- URL: http://arxiv.org/abs/2109.04321v1
- Date: Thu, 9 Sep 2021 14:54:24 GMT
- Title: Smoothed Contrastive Learning for Unsupervised Sentence Embedding
- Authors: Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang,
Songlin Hu
- Abstract summary: We introduce a smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE)
GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and 0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.
- Score: 41.09180639504244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has been gradually applied to learn high-quality
unsupervised sentence embedding. Among the previous un-supervised methods, the
latest state-of-the-art method, as far as we know, is unsupervised SimCSE
(unsup-SimCSE). Unsup-SimCSE uses the InfoNCE1loss function in the training
stage by pulling semantically similar sentences together and pushing apart
dis-similar ones.Theoretically, we expect to use larger batches in unsup-SimCSE
to get more adequate comparisons among samples and avoid overfitting. However,
increasing the batch size does not always lead to improvements, but instead
even lead to performance degradation when the batch size exceeds a threshold.
Through statistical observation, we find that this is probably due to the
introduction of low-confidence negative pairs after in-creasing the batch size.
To alleviate this problem, we introduce a simple smoothing strategy upon the
InfoNCE loss function, termedGaussian Smoothing InfoNCE
(GS-InfoNCE).Specifically, we add random Gaussian noise vectors as negative
samples, which act asa smoothing of the negative sample space.Though being
simple, the proposed smooth-ing strategy brings substantial improvements to
unsup-SimCSE. We evaluate GS-InfoNCEon the standard semantic text similarity
(STS)task. GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an
average Spear-man correlation of 1.38%, 0.72%, 1.17% and0.28% on the base of
BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.
Related papers
- Instance Smoothed Contrastive Learning for Unsupervised Sentence
Embedding [16.598732694215137]
We propose IS-CSE (instance smoothing contrastive sentence embedding) to smooth the boundaries of embeddings in the feature space.
We evaluate our method on standard semantic text similarity (STS) tasks and achieve an average of 78.30%, 79.47%, 77.73%, and 79.42% Spearman's correlation.
arXiv Detail & Related papers (2023-05-12T12:46:13Z) - Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - Improving Contrastive Learning of Sentence Embeddings with
Case-Augmented Positives and Retrieved Negatives [17.90820242798732]
Unsupervised contrastive learning methods still lag far behind the supervised counterparts.
We propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence.
For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model.
arXiv Detail & Related papers (2022-06-06T09:46:12Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence
Embedding [2.9894971434911266]
Contrastive learning has been studied for improving the performance of learning sentence embeddings.
The current state-of-the-art method is the SimCSE, which takes dropout as the data augmentation method.
S-SimCSE outperforms the state-of-the-art SimCSE more than $1%$ on BERT$_base$
arXiv Detail & Related papers (2021-11-23T09:52:45Z) - ESimCSE: Enhanced Sample Building Method for Contrastive Learning of
Unsupervised Sentence Embedding [41.09180639504244]
The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE)
We develop a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE)
ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.
arXiv Detail & Related papers (2021-09-09T16:07:31Z) - SimCSE: Simple Contrastive Learning of Sentence Embeddings [10.33373737281907]
This paper presents SimCSE, a contrastive learning framework for embeddings.
We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective.
We then incorporate annotated pairs from NLI datasets into contrastive learning by using "entailment" pairs as positives and "contradiction" pairs as hard negatives.
arXiv Detail & Related papers (2021-04-18T11:27:08Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.