Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
- URL: http://arxiv.org/abs/2305.10786v2
- Date: Mon, 23 Oct 2023 06:34:50 GMT
- Title: Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
- Authors: Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu,
Jiaqing Liu, Yukun Ma, Chong Zhang
- Abstract summary: Sentence embeddings from pre-trained language models suffer from a bias towards uninformative words.
We propose a simple and efficient unsupervised approach, Diagonal Attention Pooling (Ditto), which weights words with model-based importance estimations.
We show Ditto can alleviate the anisotropy problem and improve various pre-trained models on semantic textual similarity tasks.
- Score: 29.273438110694574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior studies diagnose the anisotropy problem in sentence representations
from pre-trained language models, e.g., BERT, without fine-tuning. Our analysis
reveals that the sentence embeddings from BERT suffer from a bias towards
uninformative words, limiting the performance in semantic textual similarity
(STS) tasks. To address this bias, we propose a simple and efficient
unsupervised approach, Diagonal Attention Pooling (Ditto), which weights words
with model-based importance estimations and computes the weighted average of
word representations from pre-trained models as sentence embeddings. Ditto can
be easily applied to any pre-trained language model as a postprocessing
operation. Compared to prior sentence embedding approaches, Ditto does not add
parameters nor requires any learning. Empirical evaluations demonstrate that
our proposed Ditto can alleviate the anisotropy problem and improve various
pre-trained models on STS tasks.
Related papers
- Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - Towards preserving word order importance through Forced Invalidation [80.33036864442182]
We show that pre-trained language models are insensitive to word order.
We propose Forced Invalidation to help preserve the importance of word order.
Our experiments demonstrate that Forced Invalidation significantly improves the sensitivity of the models to word order.
arXiv Detail & Related papers (2023-04-11T13:42:10Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting
BERT [29.04485839262945]
We propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT)
Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process.
Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines.
arXiv Detail & Related papers (2020-04-30T14:02:29Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.