Related papers: Smoothed Contrastive Learning for Unsupervised Sentence Embedding

Smoothed Contrastive Learning for Unsupervised Sentence Embedding

URL: http://arxiv.org/abs/2109.04321v1
Date: Thu, 9 Sep 2021 14:54:24 GMT
Title: Smoothed Contrastive Learning for Unsupervised Sentence Embedding
Authors: Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, Songlin Hu
Abstract summary: We introduce a smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE) GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and 0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.
Score: 41.09180639504244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding. Among the previous un-supervised methods, the latest state-of-the-art method, as far as we know, is unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE uses the InfoNCE1loss function in the training stage by pulling semantically similar sentences together and pushing apart dis-similar ones.Theoretically, we expect to use larger batches in unsup-SimCSE to get more adequate comparisons among samples and avoid overfitting. However, increasing the batch size does not always lead to improvements, but instead even lead to performance degradation when the batch size exceeds a threshold. Through statistical observation, we find that this is probably due to the introduction of low-confidence negative pairs after in-creasing the batch size. To alleviate this problem, we introduce a simple smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE).Specifically, we add random Gaussian noise vectors as negative samples, which act asa smoothing of the negative sample space.Though being simple, the proposed smooth-ing strategy brings substantial improvements to unsup-SimCSE. We evaluate GS-InfoNCEon the standard semantic text similarity (STS)task. GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.

Related papers

Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It [6.19039575840278]
Label smoothing (LS) is a popular regularisation method for training neural networks. LS degrades uncertainty rank ordering of correct vs incorrect predictions. We provide an explanation for this behaviour by analysing logit-level gradients.
arXiv Detail & Related papers (2024-03-19T06:46:24Z)
Instance Smoothed Contrastive Learning for Unsupervised Sentence Embedding [16.598732694215137]
We propose IS-CSE (instance smoothing contrastive sentence embedding) to smooth the boundaries of embeddings in the feature space. We evaluate our method on standard semantic text similarity (STS) tasks and achieve an average of 78.30%, 79.47%, 77.73%, and 79.42% Spearman's correlation.
arXiv Detail & Related papers (2023-05-12T12:46:13Z)
Doubly Stochastic Models: Learning with Unbiased Label Noises and Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent. We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters. Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size. We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z)
Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives [17.90820242798732]
Unsupervised contrastive learning methods still lag far behind the supervised counterparts. We propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence. For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model.
arXiv Detail & Related papers (2022-06-06T09:46:12Z)
Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals. We analyze the challenges these methods meet with the empirical experiment results. We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z)
S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding [2.9894971434911266]
Contrastive learning has been studied for improving the performance of learning sentence embeddings. The current state-of-the-art method is the SimCSE, which takes dropout as the data augmentation method. S-SimCSE outperforms the state-of-the-art SimCSE more than $1%$ on BERT$_base$
arXiv Detail & Related papers (2021-11-23T09:52:45Z)
ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding [41.09180639504244]
The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE) We develop a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE) ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.
arXiv Detail & Related papers (2021-09-09T16:07:31Z)
SimCSE: Simple Contrastive Learning of Sentence Embeddings [10.33373737281907]
This paper presents SimCSE, a contrastive learning framework for embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective. We then incorporate annotated pairs from NLI datasets into contrastive learning by using "entailment" pairs as positives and "contradiction" pairs as hard negatives.
arXiv Detail & Related papers (2021-04-18T11:27:08Z)
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$. We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.