Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings
- URL: http://arxiv.org/abs/2210.16848v2
- Date: Thu, 23 Mar 2023 14:35:30 GMT
- Title: Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings
- Authors: Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang
Zhao, Yue Zhang, Stan Z. Li
- Abstract summary: We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework.
Our methods are well proven to outperform the baselines by a large margin.
- Score: 39.30342855873457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although contextualized embeddings generated from large-scale pre-trained
models perform well in many tasks, traditional static embeddings (e.g.,
Skip-gram, Word2Vec) still play an important role in low-resource and
lightweight settings due to their low computational cost, ease of deployment,
and stability. In this paper, we aim to improve word embeddings by 1)
incorporating more contextual information from existing pre-trained models into
the Skip-gram framework, which we call Context-to-Vec; 2) proposing a
post-processing retrofitting method for static embeddings independent of
training by employing priori synonym knowledge and weighted vector
distribution. Through extrinsic and intrinsic tasks, our methods are well
proven to outperform the baselines by a large margin.
Related papers
- Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels.
Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table
Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - Autoencoding Improves Pre-trained Word Embeddings [26.464097783864926]
We show that retaining the top principal components is useful for improving pre-trained word embeddings.
We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings.
arXiv Detail & Related papers (2020-10-25T11:30:05Z) - Multiple Word Embeddings for Increased Diversity of Representation [15.279850826041066]
We show a technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time.
We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity is the driving force of why this technique works.
arXiv Detail & Related papers (2020-09-30T02:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.