S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence
Embedding
- URL: http://arxiv.org/abs/2111.11750v2
- Date: Wed, 24 Nov 2021 09:20:44 GMT
- Title: S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence
Embedding
- Authors: Junlei Zhang, Zhenzhong lan
- Abstract summary: Contrastive learning has been studied for improving the performance of learning sentence embeddings.
The current state-of-the-art method is the SimCSE, which takes dropout as the data augmentation method.
S-SimCSE outperforms the state-of-the-art SimCSE more than $1%$ on BERT$_base$
- Score: 2.9894971434911266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive learning has been studied for improving the performance of
learning sentence embeddings. The current state-of-the-art method is the
SimCSE, which takes dropout as the data augmentation method and feeds a
pre-trained transformer encoder the same input sentence twice. The
corresponding outputs, two sentence embeddings derived from the same sentence
with different dropout masks, can be used to build a positive pair. A network
being applied with a dropout mask can be regarded as a sub-network of itsef,
whose expected scale is determined by the dropout rate. In this paper, we push
sub-networks with different expected scales learn similar embedding for the
same sentence. SimCSE failed to do so because they fixed the dropout rate to a
tuned hyperparameter. We achieve this by sampling dropout rate from a
distribution eatch forward process. As this method may make optimization
harder, we also propose a simple sentence-wise mask strategy to sample more
sub-networks. We evaluated the proposed S-SimCSE on several popular semantic
text similarity datasets. Experimental results show that S-SimCSE outperforms
the state-of-the-art SimCSE more than $1\%$ on BERT$_{base}$
Related papers
- Simulation-free Schr\"odinger bridges via score and flow matching [89.4231207928885]
We present simulation-free score and flow matching ([SF]$2$M)
Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous flows.
Notably, [SF]$2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks simulated data.
arXiv Detail & Related papers (2023-07-07T15:42:35Z) - Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - InfoCSE: Information-aggregated Contrastive Learning of Sentence
Embeddings [61.77760317554826]
This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.
We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task.
Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
arXiv Detail & Related papers (2022-10-08T15:53:19Z) - Improving Contrastive Learning of Sentence Embeddings with
Case-Augmented Positives and Retrieved Negatives [17.90820242798732]
Unsupervised contrastive learning methods still lag far behind the supervised counterparts.
We propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence.
For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model.
arXiv Detail & Related papers (2022-06-06T09:46:12Z) - Dual Lottery Ticket Hypothesis [71.95937879869334]
Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity.
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark.
We propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH.
arXiv Detail & Related papers (2022-03-08T18:06:26Z) - ESimCSE: Enhanced Sample Building Method for Contrastive Learning of
Unsupervised Sentence Embedding [41.09180639504244]
The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE)
We develop a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE)
ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.
arXiv Detail & Related papers (2021-09-09T16:07:31Z) - Smoothed Contrastive Learning for Unsupervised Sentence Embedding [41.09180639504244]
We introduce a smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE)
GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and 0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.
arXiv Detail & Related papers (2021-09-09T14:54:24Z) - SimCSE: Simple Contrastive Learning of Sentence Embeddings [10.33373737281907]
This paper presents SimCSE, a contrastive learning framework for embeddings.
We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective.
We then incorporate annotated pairs from NLI datasets into contrastive learning by using "entailment" pairs as positives and "contradiction" pairs as hard negatives.
arXiv Detail & Related papers (2021-04-18T11:27:08Z) - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators [108.3381301768299]
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
We propose a more sample-efficient pre-training task called replaced token detection.
arXiv Detail & Related papers (2020-03-23T21:17:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.