Related papers: Obtaining Better Static Word Embeddings Using Contextual Embedding Models

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

URL: http://arxiv.org/abs/2106.04302v1
Date: Tue, 8 Jun 2021 12:59:32 GMT
Title: Obtaining Better Static Word Embeddings Using Contextual Embedding Models
Authors: Prakhar Gupta and Martin Jaggi
Abstract summary: Our proposed distillation method is a simple extension of CBOW-based training. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
Score: 53.86080627007695
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The advent of contextual word embeddings -- representations of words which incorporate semantic and syntactic information from their context -- has led to tremendous improvements on a wide variety of NLP tasks. However, recent contextual models have prohibitively high computational cost in many use-cases and are often hard to interpret. In this work, we demonstrate that our proposed distillation method, which is a simple extension of CBOW-based training, allows to significantly improve computational efficiency of NLP applications, while outperforming the quality of existing static embeddings trained from scratch as well as those distilled from previously proposed methods. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings via standard lexical evaluation tasks.

Related papers

Learning Robust Negation Text Representations [60.23044940174016]
We propose a strategy to improve negation of text encoders using diverse patterns of negation and hedging.<n>We observe large improvement in negation understanding capabilities while maintaining competitive performance on general benchmarks.<n>Our method can be adapted to LLMs, leading to improved performance on negation benchmarks.
arXiv Detail & Related papers (2025-07-17T04:48:54Z)
A Comparative Analysis of Static Word Embeddings for Hungarian [0.0]
This paper presents a comprehensive analysis of various static word embeddings for Hungarian.<n>We evaluate these embeddings on both intrinsic and extrinsic tasks to provide a holistic view of their performance.
arXiv Detail & Related papers (2025-05-12T17:57:11Z)
Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models [60.00178316095646]
Sentence embedding is essential for many NLP tasks, with contrastive learning methods achieving strong performance using datasets like NLI. Recent studies leverage large language models (LLMs) to generate sentence pairs, reducing annotation dependency. We propose a method for controlling the generation direction of LLMs in the latent space. Unlike unconstrained generation, the controlled approach ensures meaningful semantic divergence. Experiments on multiple benchmarks demonstrate that our method achieves new SOTA performance with a modest cost in ranking sentence synthesis.
arXiv Detail & Related papers (2025-02-19T12:07:53Z)
Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels. Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs) We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
Free Lunch for Efficient Textual Commonsense Integration in Language Models [20.02647320786556]
We group training samples with similar commonsense descriptions into a single batch, thus reusing the encoded description across multiple samples. Extensive experiments illustrate that the proposed batch partitioning approach effectively reduces the computational cost while preserving performance. The efficiency improvement is more pronounced on larger datasets and on devices with more memory capacity, attesting to its practical utility for large-scale applications.
arXiv Detail & Related papers (2023-05-24T19:14:57Z)
Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings [39.30342855873457]
We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework. Our methods are well proven to outperform the baselines by a large margin.
arXiv Detail & Related papers (2022-10-30T14:15:43Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Denoising Word Embeddings by Averaging in a Shared Space [34.175826109538676]
We introduce a new approach for smoothing and improving the quality of word embeddings. We project all the models to a shared vector space using an efficient implementation of the Generalized Procrustes Analysis (GPA) procedure. As the new representations are more stable and reliable, there is a noticeable improvement in rare word evaluations.
arXiv Detail & Related papers (2021-06-05T19:49:02Z)
Active Learning for Sequence Tagging with Deep Pre-trained Models and Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget. We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework. We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z)
SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z)
Analysis and Evaluation of Language Models for Word Sense Disambiguation [18.001457030065712]
Transformer-based language models have taken many fields in NLP by storm. BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. BERT and its derivatives dominate most of the existing evaluation benchmarks.
arXiv Detail & Related papers (2020-08-26T15:07:07Z)
SAMBA: Safe Model-Based & Active Reinforcement Learning [59.01424351231993]
SAMBA is a framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. We provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
arXiv Detail & Related papers (2020-06-12T10:40:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.