Frequency-aware Dimension Selection for Static Word Embedding by Mixed
Product Distance
- URL: http://arxiv.org/abs/2305.07826v1
- Date: Sat, 13 May 2023 02:53:37 GMT
- Title: Frequency-aware Dimension Selection for Static Word Embedding by Mixed
Product Distance
- Authors: Lingfeng Shen, Haiyun Jiang, Lemao Liu, Ying Chen
- Abstract summary: This paper proposes a metric (Mixed Product Distance, MPD) to select a proper dimension for word embedding algorithms without training any word embedding.
Experiments on both context-unavailable and context-available tasks demonstrate the better efficiency-performance trade-off of our MPD-based dimension selection method over baselines.
- Score: 22.374525706652207
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Static word embedding is still useful, particularly for context-unavailable
tasks, because in the case of no context available, pre-trained language models
often perform worse than static word embeddings. Although dimension is a key
factor determining the quality of static word embeddings, automatic dimension
selection is rarely discussed. In this paper, we investigate the impact of word
frequency on the dimension selection, and empirically find that word frequency
is so vital that it needs to be taken into account during dimension selection.
Based on such an empirical finding, this paper proposes a dimension selection
method that uses a metric (Mixed Product Distance, MPD) to select a proper
dimension for word embedding algorithms without training any word embedding.
Through applying a post-processing function to oracle matrices, the MPD-based
method can de-emphasize the impact of word frequency. Experiments on both
context-unavailable and context-available tasks demonstrate the better
efficiency-performance trade-off of our MPD-based dimension selection method
over baselines.
Related papers
- Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios.
Number of users and items continues to grow, leading to substantial storage resource consumption.
We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z) - Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection [34.217661429283666]
As the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size.
This paper explores word embedding dimension reduction.
We propose an efficient and effective weakly-supervised feature selection method named WordFS.
arXiv Detail & Related papers (2024-07-17T06:36:09Z) - Effect of dimensionality change on the bias of word embeddings [1.1784544255941167]
We study how the dimensionality change affects the bias of word embeddings.
There is a significant variation in the bias of word embeddings with the dimensionality change.
There is no uniformity in how the dimensionality change affects the bias of word embeddings.
arXiv Detail & Related papers (2023-12-28T13:01:10Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - A Process for Topic Modelling Via Word Embeddings [0.0]
This work combines algorithms based on word embeddings, dimensionality reduction, and clustering.
The objective is to obtain topics from a set of unclassified texts.
arXiv Detail & Related papers (2023-10-06T15:10:35Z) - Generalized Time Warping Invariant Dictionary Learning for Time Series
Classification and Clustering [8.14208923345076]
The dynamic time warping (DTW) is commonly used for dealing with temporal delays, scaling, transformation, and many other kinds of temporal misalignments issues.
We propose a generalized time warping invariant dictionary learning algorithm in this paper.
The superiority of the proposed method in terms of dictionary learning, classification, and clustering is validated through ten sets of public datasets.
arXiv Detail & Related papers (2023-06-30T14:18:13Z) - Single-channel speech separation using Soft-minimum Permutation
Invariant Training [60.99112031408449]
A long-lasting problem in supervised speech separation is finding the correct label for each separated speech signal.
Permutation Invariant Training (PIT) has been shown to be a promising solution in handling the label ambiguity problem.
In this work, we propose a probabilistic optimization framework to address the inefficiency of PIT in finding the best output-label assignment.
arXiv Detail & Related papers (2021-11-16T17:25:05Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Word Embeddings: Stability and Semantic Change [0.0]
We present an experimental study on the instability of the training process of three of the most influential embedding techniques of the last decade: word2vec, GloVe and fastText.
We propose a statistical model to describe the instability of embedding techniques and introduce a novel metric to measure the instability of the representation of an individual word.
arXiv Detail & Related papers (2020-07-23T16:03:50Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.