InfoCSE: Information-aggregated Contrastive Learning of Sentence
Embeddings
- URL: http://arxiv.org/abs/2210.06432v1
- Date: Sat, 8 Oct 2022 15:53:19 GMT
- Title: InfoCSE: Information-aggregated Contrastive Learning of Sentence
Embeddings
- Authors: Xing Wu, Chaochen Gao, Zijia Lin, Jizhong Han, Zhongyuan Wang, Songlin
Hu
- Abstract summary: This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.
We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task.
Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
- Score: 61.77760317554826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contrastive learning has been extensively studied in sentence embedding
learning, which assumes that the embeddings of different views of the same
sentence are closer. The constraint brought by this assumption is weak, and a
good sentence representation should also be able to reconstruct the original
sentence fragments. Therefore, this paper proposes an information-aggregated
contrastive learning framework for learning unsupervised sentence embeddings,
termed InfoCSE. InfoCSE forces the representation of [CLS] positions to
aggregate denser sentence information by introducing an additional Masked
language model task and a well-designed network. We evaluate the proposed
InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS)
task. Experimental results show that InfoCSE outperforms SimCSE by an average
Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large, achieving
state-of-the-art results among unsupervised sentence representation learning
methods. Our code are available at https://github.com/caskcsg/sentemb/info
Related papers
- CMLM-CSE: Based on Conditional MLM Contrastive Learning for Sentence
Embeddings [16.592691470405683]
We propose CMLM-CSE, an unsupervised contrastive learning framework based on conditional loss.
An auxiliary network is added to integrate sentence embedding to perform tasks, forcing sentence embedding to learn more masked word information.
When Bertbase was used as the pretraining language model, we exceeded SimCSE by 0.55 percentage points on average in textual similarity tasks, and when Robertabase was used as the pretraining language model, we exceeded SimCSE by 0.3 percentage points on average in textual similarity tasks.
arXiv Detail & Related papers (2023-06-16T02:39:45Z) - RankCSE: Unsupervised Sentence Representations Learning via Learning to
Rank [54.854714257687334]
We propose a novel approach, RankCSE, for unsupervised sentence representation learning.
It incorporates ranking consistency and ranking distillation with contrastive learning into a unified framework.
An extensive set of experiments are conducted on both semantic textual similarity (STS) and transfer (TR) tasks.
arXiv Detail & Related papers (2023-05-26T08:27:07Z) - Contrastive Learning of Sentence Embeddings from Scratch [26.002876719243464]
We present SynCSE, a contrastive learning framework that trains sentence embeddings with synthesized data.
Specifically, we explore utilizing large language models to synthesize the required data samples for contrastive learning.
Experimental results on sentence similarity and reranking tasks indicate that both SynCSE-partial and SynCSE-scratch greatly outperform unsupervised baselines.
arXiv Detail & Related papers (2023-05-24T11:56:21Z) - Instance Smoothed Contrastive Learning for Unsupervised Sentence
Embedding [16.598732694215137]
We propose IS-CSE (instance smoothing contrastive sentence embedding) to smooth the boundaries of embeddings in the feature space.
We evaluate our method on standard semantic text similarity (STS) tasks and achieve an average of 78.30%, 79.47%, 77.73%, and 79.42% Spearman's correlation.
arXiv Detail & Related papers (2023-05-12T12:46:13Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings [51.274478128525686]
DiffCSE is an unsupervised contrastive learning framework for learning sentence embeddings.
Our experiments show that DiffCSE achieves state-of-the-art results among unsupervised sentence representation learning methods.
arXiv Detail & Related papers (2022-04-21T17:32:01Z) - ESimCSE: Enhanced Sample Building Method for Contrastive Learning of
Unsupervised Sentence Embedding [41.09180639504244]
The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE)
We develop a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE)
ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.
arXiv Detail & Related papers (2021-09-09T16:07:31Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.