Related papers: Deep Continuous Prompt for Contrastive Learning of Sentence Embeddings

Deep Continuous Prompt for Contrastive Learning of Sentence Embeddings

URL: http://arxiv.org/abs/2203.06875v1
Date: Mon, 14 Mar 2022 06:07:44 GMT
Title: Deep Continuous Prompt for Contrastive Learning of Sentence Embeddings
Authors: Yuxin Jiang and Wei Wang
Abstract summary: We present a novel method which freezes the whole language model and only optimize the prefix deep continuous prompts. It not only tunes around 0.1% parameters of the original language model, but avoids the cumbersome computation of searching handcrafted prompts. Our proposed DCPCSE outperforms the state-of-the-art method SimCSE by a large margin.
Score: 8.70715711885114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The performance of sentence representation has been remarkably improved by the framework of contrastive learning. However, recent works still require full fine-tuning, which is quite inefficient for large-scaled pre-trained language models. To this end, we present a novel method which freezes the whole language model and only optimizes the prefix deep continuous prompts. It not only tunes around 0.1% parameters of the original language model, but avoids the cumbersome computation of searching handcrafted prompts. Experimental results show that our proposed DCPCSE outperforms the state-of-the-art method SimCSE by a large margin. We raise the performance of unsupervised BERT$_{base}$ and supervised RoBERTa$_{large}$ by 2.24 and 1.00 points, respectively. Our code is publicly avaliable at https://github.com/YJiangcm/DCPCSE

Related papers

Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning [50.26965628047682]
Adapting pre-trained models to open classes is a challenging problem in machine learning. In this paper, we consider combining the advantages of both and come up with a test-time prompt tuning approach. Our proposed method outperforms all comparison methods on average considering both base and new classes.
arXiv Detail & Related papers (2024-08-29T12:34:01Z)
CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation. We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts. Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z)
Alleviating Over-smoothing for Unsupervised Sentence Representation [96.19497378628594]
We present a Simple method named Self-Contrastive Learning (SSCL) to alleviate this issue. Our proposed method is quite simple and can be easily extended to various state-of-the-art models for performance boosting.
arXiv Detail & Related papers (2023-05-09T11:00:02Z)
Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences [1.8592384822257952]
We propose a novel topic modeling and inference algorithm. We leverage pre-trained sentence embeddings by combining generative process models and clustering. TheTailor evaluation shows that our method yields state-of-the art results with relatively little computational demands.
arXiv Detail & Related papers (2023-02-06T20:13:11Z)
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models [26.462819114575172]
This work compares sparsity paradigms in text-to-speech synthesis. It is the first work that compares sparsity paradigms in text-to-speech synthesis.
arXiv Detail & Related papers (2022-09-22T09:47:25Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Prompt Consistency for Zero-Shot Task Generalization [118.81196556175797]
In this paper, we explore methods to utilize unlabeled data to improve zero-shot performance. Specifically, we take advantage of the fact that multiple prompts can be used to specify a single task, and propose to regularize prompt consistency. Our approach outperforms the state-of-the-art zero-shot learner, T0, on 9 out of 11 datasets across 4 NLP tasks by up to 10.6 absolute points in terms of accuracy.
arXiv Detail & Related papers (2022-04-29T19:18:37Z)
Contrastive Demonstration Tuning for Pre-trained Language Models [59.90340768724675]
Demonstration examples are crucial for an excellent final performance of prompt-tuning. The proposed approach can be: (i) Plugged into any previous prompt-tuning approaches; (ii) Extended to widespread classification tasks with a large number of categories. Experimental results on 16 datasets illustrate that our method integrated with previous approaches LM-BFF and P-tuning can yield better performance.
arXiv Detail & Related papers (2022-04-09T05:30:48Z)
Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation [28.432799973328127]
We propose Homomorphic Projective Distillation (HPD) to learn compressed sentence embeddings. Our method augments a small Transformer encoder model with learnable projection layers to produce compact representations.
arXiv Detail & Related papers (2022-03-15T07:05:43Z)
Efficient Constituency Parsing by Pointing [21.395573911155495]
We propose a novel constituency parsing model that casts the parsing problem into a series of pointing tasks. Our model supports efficient top-down decoding and our learning objective is able to enforce structural consistency without resorting to the expensive CKY inference.
arXiv Detail & Related papers (2020-06-24T08:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.