TransDrift: Modeling Word-Embedding Drift using Transformer
- URL: http://arxiv.org/abs/2206.08081v1
- Date: Thu, 16 Jun 2022 10:48:26 GMT
- Title: TransDrift: Modeling Word-Embedding Drift using Transformer
- Authors: Nishtha Madaan, Prateek Chaudhury, Nishant Kumar, Srikanta Bedathur
- Abstract summary: We propose TransDrift, a transformer-based prediction model for word embeddings.
Our model accurately learns the dynamics of the embedding drift and predicts the future embedding.
Our embeddings lead to superior performance compared to the previous methods.
- Score: 8.707217592903735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern NLP applications, word embeddings are a crucial backbone that can
be readily shared across a number of tasks. However as the text distributions
change and word semantics evolve over time, the downstream applications using
the embeddings can suffer if the word representations do not conform to the
data drift. Thus, maintaining word embeddings to be consistent with the
underlying data distribution is a key problem. In this work, we tackle this
problem and propose TransDrift, a transformer-based prediction model for word
embeddings. Leveraging the flexibility of transformer, our model accurately
learns the dynamics of the embedding drift and predicts the future embedding.
In experiments, we compare with existing methods and show that our model makes
significantly more accurate predictions of the word embedding than the
baselines. Crucially, by applying the predicted embeddings as a backbone for
downstream classification tasks, we show that our embeddings lead to superior
performance compared to the previous methods.
Related papers
- Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context.
Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - Paragraph-based Transformer Pre-training for Multi-Sentence Inference [99.59693674455582]
We show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks.
We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences.
arXiv Detail & Related papers (2022-05-02T21:41:14Z) - Word2rate: training and evaluating multiple word embeddings as
statistical transitions [4.350783459690612]
We introduce a novel left-right context split objective that improves performance for tasks sensitive to word order.
Our Word2rate model is grounded in a statistical foundation using rate matrices while being competitive in variety of language tasks.
arXiv Detail & Related papers (2021-04-16T15:31:29Z) - Statistically significant detection of semantic shifts using contextual
word embeddings [7.439525715543974]
We propose an approach to estimate semantic shifts by combining contextual word embeddings with permutation-based statistical tests.
We demonstrate the performance of this approach in simulation, achieving consistently high precision by suppressing false positives.
We additionally analyzed real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus.
arXiv Detail & Related papers (2021-04-08T13:58:54Z) - Improved Biomedical Word Embeddings in the Transformer Era [2.978663539080876]
We learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information.
We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts.
arXiv Detail & Related papers (2020-12-22T03:03:50Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z) - Multiple Word Embeddings for Increased Diversity of Representation [15.279850826041066]
We show a technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time.
We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity is the driving force of why this technique works.
arXiv Detail & Related papers (2020-09-30T02:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.