Related papers: An Analysis on the Learning Rules of the Skip-Gram Model

An Analysis on the Learning Rules of the Skip-Gram Model

URL: http://arxiv.org/abs/2003.08489v1
Date: Wed, 18 Mar 2020 22:17:48 GMT
Title: An Analysis on the Learning Rules of the Skip-Gram Model
Authors: Canlin Zhang, Xiuwen Liu and Daniel Bis
Abstract summary: We derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. We provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.
Score: 4.211128681972148
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.

Related papers

Enhancing Coreference Resolution with Pretrained Language Models: Bridging the Gap Between Syntax and Semantics [0.9752323911408618]
This study introduces an innovative framework aimed at enhancing coreference resolution by utilizing pretrained language models. Our approach combines syntax parsing with semantic role labeling to accurately capture finer distinctions in referential relationships.
arXiv Detail & Related papers (2025-04-08T09:33:09Z)
Causal Graphical Models for Vision-Language Compositional Understanding [36.24185263818946]
We show that our method significantly outperforms all the state-of-the-art compositional approaches by a large margin. It also improves over methods trained using much larger datasets.
arXiv Detail & Related papers (2024-12-12T15:22:03Z)
Collapsed Language Models Promote Fairness [88.48232731113306]
We find that debiased language models exhibit collapsed alignment between token representations and word embeddings. We design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods.
arXiv Detail & Related papers (2024-10-06T13:09:48Z)
Contextual Dictionary Lookup for Knowledge Graph Completion [32.493168863565465]
Knowledge graph completion (KGC) aims to solve the incompleteness of knowledge graphs (KGs) by predicting missing links from known triples. Most existing embedding models map each relation into a unique vector, overlooking the specific fine-grained semantics of them under different entities. We present a novel method utilizing contextual dictionary lookup, enabling conventional embedding models to learn fine-grained semantics of relations in an end-to-end manner.
arXiv Detail & Related papers (2023-06-13T12:13:41Z)
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents. We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z)
Constructing Word-Context-Coupled Space Aligned with Associative Knowledge Relations for Interpretable Language Modeling [0.0]
The black-box structure of the deep neural network in pre-trained language models seriously limits the interpretability of the language modeling process. A Word-Context-Coupled Space (W2CSpace) is proposed by introducing the alignment processing between uninterpretable neural representation and interpretable statistical logic. Our language model can achieve better performance and highly credible interpretable ability compared to related state-of-the-art methods.
arXiv Detail & Related papers (2023-05-19T09:26:02Z)
A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks. Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z)
Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language. We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)
Systematic Generalization on gSCAN with Language Conditioned Embedding [19.39687991647301]
Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations. We propose a novel method that learns objects' contextualized embeddings with dynamic message passing conditioned on the input natural language.
arXiv Detail & Related papers (2020-09-11T17:35:05Z)
Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present our approach of constructing analogy datasets in terms of words, phrases and sentences. We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning? [114.24881214319048]
We show that we can learn programs, i.e., policies, for solving inference in higher order Conditional Random Fields (CRFs) using reinforcement learning. Our method solves inference tasks efficiently without imposing any constraints on the form of the potentials.
arXiv Detail & Related papers (2020-04-27T19:24:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.