PromptBERT: Improving BERT Sentence Embeddings with Prompts
- URL: http://arxiv.org/abs/2201.04337v1
- Date: Wed, 12 Jan 2022 06:54:21 GMT
- Title: PromptBERT: Improving BERT Sentence Embeddings with Prompts
- Authors: Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang,
Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
- Abstract summary: We propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
We also propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting.
Our fine-tuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings.
- Score: 95.45347849834765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The poor performance of the original BERT for sentence semantic similarity
has been widely discussed in previous works. We find that unsatisfactory
performance is mainly due to the static token embeddings biases and the
ineffective BERT layers, rather than the high cosine similarity of the sentence
embeddings. To this end, we propose a prompt based sentence embeddings method
which can reduce token embeddings biases and make the original BERT layers more
effective. By reformulating the sentence embeddings task as the
fillin-the-blanks problem, our method significantly improves the performance of
original BERT. We discuss two prompt representing methods and three prompt
searching methods for prompt based sentence embeddings. Moreover, we propose a
novel unsupervised training objective by the technology of template denoising,
which substantially shortens the performance gap between the supervised and
unsupervised setting. For experiments, we evaluate our method on both non
fine-tuned and fine-tuned settings. Even a non fine-tuned method can outperform
the fine-tuned methods like unsupervised ConSERT on STS tasks. Our fine-tuned
method outperforms the state-of-the-art method SimCSE in both unsupervised and
supervised settings. Compared to SimCSE, we achieve 2.29 and 2.58 points
improvements on BERT and RoBERTa respectively under the unsupervised setting.
Related papers
- BERTer: The Efficient One [0.0]
We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity.
Our findings reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures.
arXiv Detail & Related papers (2024-07-19T05:33:09Z) - Sentence Embeddings using Supervised Contrastive Learning [0.0]
We propose a new method to build sentence embeddings by doing supervised contrastive learning.
Our method fine-tunes pretrained BERT on SNLI data, incorporating both supervised crossentropy loss and supervised contrastive loss.
arXiv Detail & Related papers (2021-06-09T03:30:29Z) - Self-Guided Contrastive Learning for BERT Sentence Representations [19.205754738851546]
We propose a contrastive learning method that utilizes self-guidance for improving the quality of BERT sentence representations.
Our method fine-tunes BERT in a self-supervised fashion, does not rely on data augmentation, and enables the usual [] token embeddings to function as sentence vectors.
arXiv Detail & Related papers (2021-06-03T05:52:43Z) - ConSERT: A Contrastive Framework for Self-Supervised Sentence
Representation Transfer [19.643512923368743]
We present ConSERT, a Contrastive Framework for Self-Supervised Sentence Representation Transfer.
By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations.
Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art.
arXiv Detail & Related papers (2021-05-25T08:15:01Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - TernaryBERT: Distillation-aware Ultra-low Bit BERT [53.06741585060951]
We propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model.
Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods.
arXiv Detail & Related papers (2020-09-27T10:17:28Z) - An Unsupervised Sentence Embedding Method by Mutual Information
Maximization [34.947950543830686]
Sentence BERT (SBERT) is inefficient for sentence-pair tasks such as clustering or semantic search.
We propose a lightweight extension on top of BERT and a novel self-supervised learning objective.
Our method is not restricted by the availability of labeled data, such as it can be applied on different domain-specific corpus.
arXiv Detail & Related papers (2020-09-25T07:16:51Z) - BERT-ATTACK: Adversarial Attack Against BERT Using BERT [77.82947768158132]
Adrial attacks for discrete data (such as texts) are more challenging than continuous data (such as images)
We propose textbfBERT-Attack, a high-quality and effective method to generate adversarial samples.
Our method outperforms state-of-the-art attack strategies in both success rate and perturb percentage.
arXiv Detail & Related papers (2020-04-21T13:30:02Z) - Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation [84.64004917951547]
Fine-tuning pre-trained language models like BERT has become an effective way in NLP.
In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation.
arXiv Detail & Related papers (2020-02-24T16:17:12Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.