Learning to Remove: Towards Isotropic Pre-trained BERT Embedding
- URL: http://arxiv.org/abs/2104.05274v1
- Date: Mon, 12 Apr 2021 08:13:59 GMT
- Title: Learning to Remove: Towards Isotropic Pre-trained BERT Embedding
- Authors: Yuxin Liang, Rui Cao, Jie Zheng, Jie Ren, Ling Gao
- Abstract summary: Research in word representation shows that isotropic embeddings can significantly improve performance on downstream tasks.
We measure and analyze the geometry of pre-trained BERT embedding and find that it is far from isotropic.
We propose a simple, and yet effective method to fix this problem: remove several dominant directions of BERT embedding with a set of learnable weights.
- Score: 7.765987411382461
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained language models such as BERT have become a more common choice of
natural language processing (NLP) tasks. Research in word representation shows
that isotropic embeddings can significantly improve performance on downstream
tasks. However, we measure and analyze the geometry of pre-trained BERT
embedding and find that it is far from isotropic. We find that the word vectors
are not centered around the origin, and the average cosine similarity between
two random words is much higher than zero, which indicates that the word
vectors are distributed in a narrow cone and deteriorate the representation
capacity of word embedding. We propose a simple, and yet effective method to
fix this problem: remove several dominant directions of BERT embedding with a
set of learnable weights. We train the weights on word similarity tasks and
show that processed embedding is more isotropic. Our method is evaluated on
three standardized tasks: word similarity, word analogy, and semantic textual
similarity. In all tasks, the word embedding processed by our method
consistently outperforms the original embedding (with average improvement of
13% on word analogy and 16% on semantic textual similarity) and two baseline
methods. Our method is also proven to be more robust to changes of
hyperparameter.
Related papers
- Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Perturbing Inputs for Fragile Interpretations in Deep Natural Language
Processing [18.91129968022831]
Interpretability methods need to be robust for trustworthy NLP applications in high-stake areas like medicine or finance.
Our paper demonstrates how interpretations can be manipulated by making simple word perturbations on an input text.
arXiv Detail & Related papers (2021-08-11T02:07:21Z) - Word2rate: training and evaluating multiple word embeddings as
statistical transitions [4.350783459690612]
We introduce a novel left-right context split objective that improves performance for tasks sensitive to word order.
Our Word2rate model is grounded in a statistical foundation using rate matrices while being competitive in variety of language tasks.
arXiv Detail & Related papers (2021-04-16T15:31:29Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - On the Sentence Embeddings from Pre-trained Language Models [78.45172445684126]
In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited.
We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity.
We propose to transform the anisotropic sentence embedding distribution to a smooth and isotropic Gaussian distribution through normalizing flows that are learned with an unsupervised objective.
arXiv Detail & Related papers (2020-11-02T13:14:57Z) - Word Embeddings: Stability and Semantic Change [0.0]
We present an experimental study on the instability of the training process of three of the most influential embedding techniques of the last decade: word2vec, GloVe and fastText.
We propose a statistical model to describe the instability of embedding techniques and introduce a novel metric to measure the instability of the representation of an individual word.
arXiv Detail & Related papers (2020-07-23T16:03:50Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations.
Our approach significantly improves crosslingual sentence retrieval performance over all other approaches.
It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.