Related papers: Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

URL: http://arxiv.org/abs/2210.16848v2
Date: Thu, 23 Mar 2023 14:35:30 GMT
Title: Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings
Authors: Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang Zhao, Yue Zhang, Stan Z. Li
Abstract summary: We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework. Our methods are well proven to outperform the baselines by a large margin.
Score: 39.30342855873457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although contextualized embeddings generated from large-scale pre-trained models perform well in many tasks, traditional static embeddings (e.g., Skip-gram, Word2Vec) still play an important role in low-resource and lightweight settings due to their low computational cost, ease of deployment, and stability. In this paper, we aim to improve word embeddings by 1) incorporating more contextual information from existing pre-trained models into the Skip-gram framework, which we call Context-to-Vec; 2) proposing a post-processing retrofitting method for static embeddings independent of training by employing priori synonym knowledge and weighted vector distribution. Through extrinsic and intrinsic tasks, our methods are well proven to outperform the baselines by a large margin.

Related papers

Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition [4.249842620609683]
This paper focuses on improving the performance of pre-trained language models in zero-shot settings through a simple and easily implementable method. We propose a novel backward attention mechanism to enhance contextual information encoding.
arXiv Detail & Related papers (2025-02-28T05:19:18Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels. Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer. In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z)
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z)
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks. We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z)
SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training. We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z)
Obtaining Better Static Word Embeddings Using Contextual Embedding Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training. As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z)
Autoencoding Improves Pre-trained Word Embeddings [26.464097783864926]
We show that retaining the top principal components is useful for improving pre-trained word embeddings. We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings.
arXiv Detail & Related papers (2020-10-25T11:30:05Z)
Multiple Word Embeddings for Increased Diversity of Representation [15.279850826041066]
We show a technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time. We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity is the driving force of why this technique works.
arXiv Detail & Related papers (2020-09-30T02:33:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.