Frequency-based Distortions in Contextualized Word Embeddings
- URL: http://arxiv.org/abs/2104.08465v1
- Date: Sat, 17 Apr 2021 06:35:48 GMT
- Title: Frequency-based Distortions in Contextualized Word Embeddings
- Authors: Kaitlyn Zhou, Kawin Ethayarajh, Dan Jurafsky
- Abstract summary: This work explores the geometric characteristics of contextualized word embeddings with two novel tools.
Words of high and low frequency differ significantly with respect to their representational geometry.
BERT-Base has more trouble differentiating between South American and African countries than North American and European ones.
- Score: 29.88883761339757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How does word frequency in pre-training data affect the behavior of
similarity metrics in contextualized BERT embeddings? Are there systematic ways
in which some word relationships are exaggerated or understated? In this work,
we explore the geometric characteristics of contextualized word embeddings with
two novel tools: (1) an identity probe that predicts the identity of a word
using its embedding; (2) the minimal bounding sphere for a word's
contextualized representations. Our results reveal that words of high and low
frequency differ significantly with respect to their representational geometry.
Such differences introduce distortions: when compared to human judgments, point
estimates of embedding similarity (e.g., cosine similarity) can over- or
under-estimate the semantic similarity of two words, depending on the frequency
of those words in the training data. This has downstream societal implications:
BERT-Base has more trouble differentiating between South American and African
countries than North American and European ones. We find that these distortions
persist when using BERT-Multilingual, suggesting that they cannot be easily
fixed with additional data, which in turn introduces new distortions.
Related papers
- Solving Cosine Similarity Underestimation between High Frequency Words
by L2 Norm Discounting [19.12036493733793]
We propose a method to discount the L2 norm of a contextualised word embedding by the frequency of that word in a corpus when measuring the cosine similarities between words.
Experimental results on a contextualised word similarity dataset show that our proposed discounting method accurately solves the similarity underestimation problem.
arXiv Detail & Related papers (2023-05-17T23:41:30Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Investigating the Frequency Distortion of Word Embeddings and Its Impact
on Bias Metrics [2.1374208474242815]
We systematically study the association between frequency and semantic similarity in several static word embeddings.
We find that Skip-gram, GloVe and FastText embeddings tend to produce higher semantic similarity between high-frequency words than between other frequency combinations.
arXiv Detail & Related papers (2022-11-15T15:11:06Z) - Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans
vs. BERT [64.40111510974957]
We test whether meaning interferes with subject-verb number agreement in English.
We generate semantically well-formed and nonsensical items.
We find that BERT and humans are both sensitive to our semantic manipulation.
arXiv Detail & Related papers (2022-09-21T17:57:23Z) - Lost in Context? On the Sense-wise Variance of Contextualized Word
Embeddings [11.475144702935568]
We quantify how much the contextualized embeddings of each word sense vary across contexts in typical pre-trained models.
We find that word representations are position-biased, where the first words in different contexts tend to be more similar.
arXiv Detail & Related papers (2022-08-20T12:27:25Z) - Problems with Cosine as a Measure of Embedding Similarity for High
Frequency Words [45.58634797899206]
We find that cosine similarity underestimates the similarity of frequent words with other instances of the same word or other words across contexts.
We conjecture that this underestimation of similarity for high frequency words is due to differences in the representational geometry of high and low frequency words.
arXiv Detail & Related papers (2022-05-10T18:00:06Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Does BERT Understand Sentiment? Leveraging Comparisons Between
Contextual and Non-Contextual Embeddings to Improve Aspect-Based Sentiment
Models [0.0]
We show that training a comparison of a contextual embedding from BERT and a generic word embedding can be used to infer sentiment.
We also show that if we finetune a subset of weights the model built on comparison of BERT and generic word embedding, it can get state of the art results for Polarity Detection in Aspect Based Sentiment Classification datasets.
arXiv Detail & Related papers (2020-11-23T19:12:31Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.