ConSERT: A Contrastive Framework for Self-Supervised Sentence
Representation Transfer
- URL: http://arxiv.org/abs/2105.11741v1
- Date: Tue, 25 May 2021 08:15:01 GMT
- Title: ConSERT: A Contrastive Framework for Self-Supervised Sentence
Representation Transfer
- Authors: Yuanmeng Yan, Rumei Li, Sirui Wang, Fuzheng Zhang, Wei Wu and Weiran
Xu
- Abstract summary: We present ConSERT, a Contrastive Framework for Self-Supervised Sentence Representation Transfer.
By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations.
Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art.
- Score: 19.643512923368743
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning high-quality sentence representations benefits a wide range of
natural language processing tasks. Though BERT-based pre-trained language
models achieve high performance on many downstream tasks, the native derived
sentence representations are proved to be collapsed and thus produce a poor
performance on the semantic textual similarity (STS) tasks. In this paper, we
present ConSERT, a Contrastive Framework for Self-Supervised Sentence
Representation Transfer, that adopts contrastive learning to fine-tune BERT in
an unsupervised and effective way. By making use of unlabeled texts, ConSERT
solves the collapse issue of BERT-derived sentence representations and make
them more applicable for downstream tasks. Experiments on STS datasets
demonstrate that ConSERT achieves an 8\% relative improvement over the previous
state-of-the-art, even comparable to the supervised SBERT-NLI. And when further
incorporating NLI supervision, we achieve new state-of-the-art performance on
STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples
available, showing its robustness in data scarcity scenarios.
Related papers
- Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue.
Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z) - Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - PromptBERT: Improving BERT Sentence Embeddings with Prompts [95.45347849834765]
We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
arXiv Detail & Related papers (2022-01-12T06:54:21Z) - Evaluation of BERT and ALBERT Sentence Embedding Performance on
Downstream NLP Tasks [4.955649816620742]
This paper explores on sentence embedding models for BERT and ALBERT.
We take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT)
arXiv Detail & Related papers (2021-01-26T09:14:06Z) - To BERT or Not to BERT: Comparing Task-specific and Task-agnostic
Semi-Supervised Approaches for Sequence Tagging [46.62643525729018]
Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data.
We show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.
arXiv Detail & Related papers (2020-10-27T04:03:47Z) - An Unsupervised Sentence Embedding Method by Mutual Information
Maximization [34.947950543830686]
Sentence BERT (SBERT) is inefficient for sentence-pair tasks such as clustering or semantic search.
We propose a lightweight extension on top of BERT and a novel self-supervised learning objective.
Our method is not restricted by the availability of labeled data, such as it can be applied on different domain-specific corpus.
arXiv Detail & Related papers (2020-09-25T07:16:51Z) - Exploring Cross-sentence Contexts for Named Entity Recognition with BERT [1.4998865865537996]
We present a study exploring the use of cross-sentence information for NER using BERT models in five languages.
We find that adding context in the form of additional sentences to BERT input increases NER performance on all of the tested languages and models.
We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT.
arXiv Detail & Related papers (2020-06-02T12:34:52Z) - CERT: Contrastive Self-supervised Learning for Language Understanding [20.17416958052909]
We propose CERT: Contrastive self-supervised Representations from Transformers.
CERT pretrains language representation models using contrastive self-supervised learning at the sentence level.
We evaluate CERT on 11 natural language understanding tasks in the GLUE benchmark where CERT outperforms BERT on 7 tasks, achieves the same performance as BERT on 2 tasks, and performs worse than BERT on 2 tasks.
arXiv Detail & Related papers (2020-05-16T16:20:38Z) - Fast and Accurate Deep Bidirectional Language Representations for
Unsupervised Learning [31.897630023454067]
We propose a novel deep bidirectional language model called Transformer-based Text Autoencoder (T-TA)
The T-TA computes contextual language representations without repetition and has benefits of the deep bidirectional architecture like BERT.
In run-time experiments on CPU environments, the proposed T-TA performs over six times faster than the BERT-based model in the reranking task and twelve times faster in the semantic similarity task.
arXiv Detail & Related papers (2020-04-17T07:43:38Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.