SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained
Language Models
- URL: http://arxiv.org/abs/2203.02167v1
- Date: Fri, 4 Mar 2022 07:36:30 GMT
- Title: SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained
Language Models
- Authors: Liang Wang, Wei Zhao, Zhuoyu Wei, Jingming Liu
- Abstract summary: In this paper, we introduce three types of negatives: in-batch negatives, pre-batch negatives, and self-negatives which act as a simple form of hard negatives.
Our proposed model SimKGC can substantially outperform embedding-based methods on several benchmark datasets.
In terms of mean reciprocal rank (MRR), we advance the state-of-the-art by +19% on WN18RR, +6.8% on the Wikidata5M transductive setting, and +22% on the Wikidata5M inductive setting.
- Score: 9.063614185765855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge graph completion (KGC) aims to reason over known facts and infer
the missing links. Text-based methods such as KGBERT (Yao et al., 2019) learn
entity representations from natural language descriptions, and have the
potential for inductive KGC. However, the performance of text-based methods
still largely lag behind graph embedding-based methods like TransE (Bordes et
al., 2013) and RotatE (Sun et al., 2019b). In this paper, we identify that the
key issue is efficient contrastive learning. To improve the learning
efficiency, we introduce three types of negatives: in-batch negatives,
pre-batch negatives, and self-negatives which act as a simple form of hard
negatives. Combined with InfoNCE loss, our proposed model SimKGC can
substantially outperform embedding-based methods on several benchmark datasets.
In terms of mean reciprocal rank (MRR), we advance the state-of-the-art by +19%
on WN18RR, +6.8% on the Wikidata5M transductive setting, and +22% on the
Wikidata5M inductive setting. Thorough analyses are conducted to gain insights
into each component. Our code is available at
https://github.com/intfloat/SimKGC .
Related papers
- InstructEngine: Instruction-driven Text-to-Image Alignment [39.591411427738095]
InstructEngine improves SD v1.5 and SDXL's performance by 10.53% and 5.30%, outperforming state-of-the-art baselines.
A win rate of over 50% in human reviews also proves that InstructEngine better aligns with human preferences.
arXiv Detail & Related papers (2025-04-14T15:36:28Z) - NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts [57.53692236201343]
We propose a Multi-Task Correction MoE, where we train the experts to become an expert'' of speech-to-text, language-to-text and vision-to-text datasets.
NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
arXiv Detail & Related papers (2024-11-08T20:11:24Z) - MoCoSA: Momentum Contrast for Knowledge Graph Completion with
Structure-Augmented Pre-trained Language Models [11.57782182864771]
We propose Momentum Contrast for knowledge graph completion with Structure-Augmented pre-trained language models (MoCoSA)
Our approach achieves state-of-the-art performance in terms of mean reciprocal rank (MRR), with improvements of 2.5% on WN18RR and 21% on OpenBG500.
arXiv Detail & Related papers (2023-08-16T08:09:10Z) - LeTI: Learning to Generate from Textual Interactions [60.425769582343506]
We explore LMs' potential to learn from textual interactions (LETI) that not only check their correctness with binary labels but also pinpoint and explain errors in their outputs through textual feedback.
Our focus is the code generation task, where the model produces code based on natural language instructions.
LETI iteratively fine-tunes the model, using the objective LM, on a concatenation of natural language instructions, LM-generated programs, and textual feedback.
arXiv Detail & Related papers (2023-05-17T15:53:31Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Out-of-Vocabulary Entities in Link Prediction [1.9036571490366496]
Link prediction is often used as a proxy to evaluate the quality of embeddings.
As benchmarks are crucial for the fair comparison of algorithms, ensuring their quality is tantamount to providing a solid ground for developing better solutions.
We provide an implementation of an approach for spotting and removing such entities and provide corrected versions of the datasets WN18RR, FB15K-237, and YAGO3-10.
arXiv Detail & Related papers (2021-05-26T12:58:18Z) - Combining Label Propagation and Simple Models Out-performs Graph Neural
Networks [52.121819834353865]
We show that for many standard transductive node classification benchmarks, we can exceed or match the performance of state-of-the-art GNNs.
We call this overall procedure Correct and Smooth (C&S)
Our approach exceeds or nearly matches the performance of state-of-the-art GNNs on a wide variety of benchmarks.
arXiv Detail & Related papers (2020-10-27T02:10:52Z) - SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner.
A key of success to such contrastive learning methods is how to draw positive and negative samples.
In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z) - Knowledge Base Completion: Baseline strikes back (Again) [36.52445566431404]
Knowledge Base Completion (KBC) has been a very active area lately.
Recent developments allow us to use all available negative samples for training.
We show that Complex, when trained using all available negative samples, gives near state-of-the-art performance on all the datasets.
arXiv Detail & Related papers (2020-05-02T11:53:22Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z) - Reinforced Negative Sampling over Knowledge Graph for Recommendation [106.07209348727564]
We develop a new negative sampling model, Knowledge Graph Policy Network (kgPolicy), which works as a reinforcement learning agent to explore high-quality negatives.
kgPolicy navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender.
arXiv Detail & Related papers (2020-03-12T12:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.