Obtaining Better Static Word Embeddings Using Contextual Embedding
Models
- URL: http://arxiv.org/abs/2106.04302v1
- Date: Tue, 8 Jun 2021 12:59:32 GMT
- Title: Obtaining Better Static Word Embeddings Using Contextual Embedding
Models
- Authors: Prakhar Gupta and Martin Jaggi
- Abstract summary: Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
- Score: 53.86080627007695
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of contextual word embeddings -- representations of words which
incorporate semantic and syntactic information from their context -- has led to
tremendous improvements on a wide variety of NLP tasks. However, recent
contextual models have prohibitively high computational cost in many use-cases
and are often hard to interpret. In this work, we demonstrate that our proposed
distillation method, which is a simple extension of CBOW-based training, allows
to significantly improve computational efficiency of NLP applications, while
outperforming the quality of existing static embeddings trained from scratch as
well as those distilled from previously proposed methods. As a side-effect, our
approach also allows a fair comparison of both contextual and static embeddings
via standard lexical evaluation tasks.
Related papers
- Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels.
Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z) - Free Lunch for Efficient Textual Commonsense Integration in Language
Models [20.02647320786556]
We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples.
Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance.
The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.
arXiv Detail & Related papers (2023-05-24T19:14:57Z) - Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings [39.30342855873457]
We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework.
Our methods are well proven to outperform the baselines by a large margin.
arXiv Detail & Related papers (2022-10-30T14:15:43Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z) - Denoising Word Embeddings by Averaging in a Shared Space [34.175826109538676]
We introduce a new approach for smoothing and improving the quality of word embeddings.
We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure.
As the new representations are more stable and reliable, there is a noticeable improvement in rare word evaluations.
arXiv Detail & Related papers (2021-06-05T19:49:02Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Analysis and Evaluation of Language Models for Word Sense Disambiguation [18.001457030065712]
Transformer-based language models have taken many fields in NLP by storm.
BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense.
BERT and its derivatives dominate most of the existing evaluation benchmarks.
arXiv Detail & Related papers (2020-08-26T15:07:07Z) - SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics.
We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations.
We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.