Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings
- URL: http://arxiv.org/abs/2210.16848v2
- Date: Thu, 23 Mar 2023 14:35:30 GMT
- Title: Using Context-to-Vector with Graph Retrofitting to Improve Word
Embeddings
- Authors: Jiangbin Zheng, Yile Wang, Ge Wang, Jun Xia, Yufei Huang, Guojiang
Zhao, Yue Zhang, Stan Z. Li
- Abstract summary: We aim to improve word embeddings by incorporating more contextual information into the Skip-gram framework.
Our methods are well proven to outperform the baselines by a large margin.
- Score: 39.30342855873457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although contextualized embeddings generated from large-scale pre-trained
models perform well in many tasks, traditional static embeddings (e.g.,
Skip-gram, Word2Vec) still play an important role in low-resource and
lightweight settings due to their low computational cost, ease of deployment,
and stability. In this paper, we aim to improve word embeddings by 1)
incorporating more contextual information from existing pre-trained models into
the Skip-gram framework, which we call Context-to-Vec; 2) proposing a
post-processing retrofitting method for static embeddings independent of
training by employing priori synonym knowledge and weighted vector
distribution. Through extrinsic and intrinsic tasks, our methods are well
proven to outperform the baselines by a large margin.
Related papers
- Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization [78.61621802973262]
We introduce an Orthogonal finetuning method for efficiently updating pretrained weights.
A cross-regularization strategy is also exploited to maintain the stability in terms of zero-shot generalization.
We conduct extensive experiments to demonstrate that our method explicitly steers pretrained weight space to represent the task-specific knowledge.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - Accurate Neural Network Pruning Requires Rethinking Sparse Optimization [87.90654868505518]
We show the impact of high sparsity on model training using the standard computer vision and natural language processing sparsity benchmarks.
We provide new approaches for mitigating this issue for both sparse pre-training of vision models and sparse fine-tuning of language models.
arXiv Detail & Related papers (2023-08-03T21:49:14Z) - Extracting Text Representations for Terms and Phrases in Technical
Domains [9.27244202193623]
We propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices.
Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster.
arXiv Detail & Related papers (2023-05-25T08:59:36Z) - SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table
Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z) - Obtaining Better Static Word Embeddings Using Contextual Embedding
Models [53.86080627007695]
Our proposed distillation method is a simple extension of CBOW-based training.
As a side-effect, our approach also allows a fair comparison of both contextual and static embeddings.
arXiv Detail & Related papers (2021-06-08T12:59:32Z) - Autoencoding Improves Pre-trained Word Embeddings [26.464097783864926]
We show that retaining the top principal components is useful for improving pre-trained word embeddings.
We experimentally verify our theoretical claims and show that retaining the top principal components is indeed useful for improving pre-trained word embeddings.
arXiv Detail & Related papers (2020-10-25T11:30:05Z) - Multiple Word Embeddings for Increased Diversity of Representation [15.279850826041066]
We show a technique that substantially and consistently improves performance over a strong baseline with negligible increase in run time.
We analyze aspects of pre-trained embedding similarity and vocabulary coverage and find that the representational diversity is the driving force of why this technique works.
arXiv Detail & Related papers (2020-09-30T02:33:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.