Autoencoding Improves Pre-trained Word Embeddings
- URL: http://arxiv.org/abs/2010.13094v2
- Date: Tue, 27 Oct 2020 07:51:34 GMT
- Title: Autoencoding Improves Pre-trained Word Embeddings
- Authors: Masahiro Kaneko and Danushka Bollegala
- Abstract summary: We show that retaining the top principal components is useful for improving pre-trained word embeddings.
We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings.
- Score: 26.464097783864926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior work investigating the geometry of pre-trained word embeddings have
shown that word embeddings to be distributed in a narrow cone and by centering
and projecting using principal component vectors one can increase the accuracy
of a given set of pre-trained word embeddings. However, theoretically, this
post-processing step is equivalent to applying a linear autoencoder to minimise
the squared l2 reconstruction error. This result contradicts prior work (Mu and
Viswanath, 2018) that proposed to remove the top principal components from
pre-trained embeddings. We experimentally verify our theoretical claims and
show that retaining the top principal components is indeed useful for improving
pre-trained word embeddings, without requiring access to additional linguistic
resources or labelled data.
Related papers
- How Truncating Weights Improves Reasoning in Language Models [49.80959223722325]
We study how certain global associations tend to be stored in specific weight components or Transformer blocks.
We analyze how this arises during training, both empirically and theoretically.
arXiv Detail & Related papers (2024-06-05T08:51:08Z) - An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings [39.30342855873457]
We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework.
Our methods are well proven to outperform the baselines by a large margin.
arXiv Detail & Related papers (2022-10-30T14:15:43Z) - TransDrift: Modeling Word-Embedding Drift using Transformer [8.707217592903735]
We propose TransDrift, a transformer-based prediction model for word embeddings.
Our model accurately learns the dynamics of the embedding drift and predicts the future embedding.
Our embeddings lead to superior performance compared to the previous methods.
arXiv Detail & Related papers (2022-06-16T10:48:26Z) - Out-of-Manifold Regularization in Contextual Embedding Space for Text
Classification [22.931314501371805]
We propose a new approach to finding and regularizing the remainder of the space, referred to as out-of-manifold.
We synthesize the out-of-manifold embeddings based on two embeddings obtained from actually-observed words.
A discriminator is trained to detect whether an input embedding is located inside the manifold or not, and simultaneously, a generator is optimized to produce new embeddings that can be easily identified as out-of-manifold.
arXiv Detail & Related papers (2021-05-14T10:17:59Z) - Dictionary-based Debiasing of Pre-trained Word Embeddings [28.378270372391498]
We propose a method for debiasing pre-trained word embeddings using dictionaries.
Our proposed method does not require the types of biases to be pre-defined in the form of word lists.
Experimental results on standard benchmark datasets show that the proposed method can accurately remove unfair biases encoded in pre-trained word embeddings.
arXiv Detail & Related papers (2021-01-23T15:44:23Z) - Bi-tuning of Pre-trained Representations [79.58542780707441]
Bi-tuning is a general learning framework to fine-tune both supervised and unsupervised pre-trained representations to downstream tasks.
Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations.
Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins.
arXiv Detail & Related papers (2020-11-12T03:32:25Z) - Learning Efficient Task-Specific Meta-Embeddings with Word Prisms [17.288765083303243]
We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand.
We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements on all tasks.
arXiv Detail & Related papers (2020-11-05T16:08:50Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.