Effect of dimensionality change on the bias of word embeddings
- URL: http://arxiv.org/abs/2312.17292v1
- Date: Thu, 28 Dec 2023 13:01:10 GMT
- Title: Effect of dimensionality change on the bias of word embeddings
- Authors: Rohit Raj Rai, Amit Awekar
- Abstract summary: We study how the dimensionality change affects the bias of word embeddings.
There is a significant variation in the bias of word embeddings with the dimensionality change.
There is no uniformity in how the dimensionality change affects the bias of word embeddings.
- Score: 1.1784544255941167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embedding methods (WEMs) are extensively used for representing text
data. The dimensionality of these embeddings varies across various tasks and
implementations. The effect of dimensionality change on the accuracy of the
downstream task is a well-explored question. However, how the dimensionality
change affects the bias of word embeddings needs to be investigated. Using the
English Wikipedia corpus, we study this effect for two static (Word2Vec and
fastText) and two context-sensitive (ElMo and BERT) WEMs. We have two
observations. First, there is a significant variation in the bias of word
embeddings with the dimensionality change. Second, there is no uniformity in
how the dimensionality change affects the bias of word embeddings. These
factors should be considered while selecting the dimensionality of word
embeddings.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection [34.217661429283666]
As the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size.
This paper explores word embedding dimension reduction.
We propose an efficient and effective weakly-supervised feature selection method named WordFS.
arXiv Detail & Related papers (2024-07-17T06:36:09Z) - RealCustom: Narrowing Real Text Word for Real-Time Open-Domain
Text-to-Image Customization [57.86083349873154]
Text-to-image customization aims to synthesize text-driven images for the given subjects.
Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text.
We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only.
arXiv Detail & Related papers (2024-03-01T12:12:09Z) - Frequency-aware Dimension Selection for Static Word Embedding by Mixed
Product Distance [22.374525706652207]
This paper proposes a metric (Mixed Product Distance, MPD) to select a proper dimension for word embedding algorithms without training any word embedding.
Experiments on both context-unavailable and context-available tasks demonstrate the better efficiency-performance trade-off of our MPD-based dimension selection method over baselines.
arXiv Detail & Related papers (2023-05-13T02:53:37Z) - Neighboring Words Affect Human Interpretation of Saliency Explanations [65.29015910991261]
Word-level saliency explanations are often used to communicate feature-attribution in text-based models.
Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores.
We investigate how the marking of a word's neighboring words affect the explainee's perception of the word's importance in the context of a saliency explanation.
arXiv Detail & Related papers (2023-05-04T09:50:25Z) - Word Tour: One-dimensional Word Embeddings via the Traveling Salesman
Problem [6.09170287691728]
In this study, we propose WordTour, unsupervised one-dimensional word embeddings.
To achieve the challenging goal, we propose a decomposition of the desiderata of word embeddings into two parts, completeness and soundness.
Owing to the single dimensionality, WordTour is extremely efficient and provides a minimal means to handle word embeddings.
arXiv Detail & Related papers (2022-05-04T08:46:02Z) - Frequency-based Distortions in Contextualized Word Embeddings [29.88883761339757]
This work explores the geometric characteristics of contextualized word embeddings with two novel tools.
Words of high and low frequency differ significantly with respect to their representational geometry.
BERT-Base has more trouble differentiating between South American and African countries than North American and European ones.
arXiv Detail & Related papers (2021-04-17T06:35:48Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Word Embeddings: Stability and Semantic Change [0.0]
We present an experimental study on the instability of the training process of three of the most influential embedding techniques of the last decade: word2vec, GloVe and fastText.
We propose a statistical model to describe the instability of embedding techniques and introduce a novel metric to measure the instability of the representation of an individual word.
arXiv Detail & Related papers (2020-07-23T16:03:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.