Pre-training Transformers for Knowledge Graph Completion
- URL: http://arxiv.org/abs/2303.15682v1
- Date: Tue, 28 Mar 2023 02:10:37 GMT
- Title: Pre-training Transformers for Knowledge Graph Completion
- Authors: Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji and
Jianfeng Gao
- Abstract summary: We introduce a novel inductive KG representation model (iHT) for learning transferable representation for knowledge graphs.
iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers.
Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models.
- Score: 81.4078733132239
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning transferable representation of knowledge graphs (KGs) is challenging
due to the heterogeneous, multi-relational nature of graph structures. Inspired
by Transformer-based pretrained language models' success on learning
transferable representation for texts, we introduce a novel inductive KG
representation model (iHT) for KG completion by large-scale pre-training. iHT
consists of a entity encoder (e.g., BERT) and a neighbor-aware relational
scoring function both parameterized by Transformers. We first pre-train iHT on
a large KG dataset, Wikidata5M. Our approach achieves new state-of-the-art
results on matched evaluations, with a relative improvement of more than 25% in
mean reciprocal rank over previous SOTA models. When further fine-tuned on
smaller KGs with either entity and relational shifts, pre-trained iHT
representations are shown to be transferable, significantly improving the
performance on FB15K-237 and WN18RR.
Related papers
- Learning and Transferring Sparse Contextual Bigrams with Linear Transformers [47.37256334633102]
We introduce the Sparse Con Bigram model, where the next token's generation depends on a sparse set of earlier positions determined by the last token.
We analyze the training dynamics and sample complexity of learning SCB using a one-layer linear transformer with a gradient-based algorithm.
We prove that, provided a nontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allows us to bypass the initial sample-intensive stage.
arXiv Detail & Related papers (2024-10-30T20:29:10Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Semantic-visual Guided Transformer for Few-shot Class-incremental
Learning [6.300141694311465]
We develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes.
Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone.
arXiv Detail & Related papers (2023-03-27T15:06:49Z) - Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model.
Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z) - Link-Intensive Alignment for Incomplete Knowledge Graphs [28.213397255810936]
In this work, we address the problem of aligning incomplete KGs with representation learning.
Our framework exploits two feature channels: transitivity-based and proximity-based.
The two feature channels are jointly learned to exchange important features between the input KGs.
Also, we develop a missing links detector that discovers and recovers the missing links during the training process.
arXiv Detail & Related papers (2021-12-17T00:41:28Z) - Vector-quantized Image Modeling with Improved VQGAN [93.8443646643864]
We propose a Vector-quantized Image Modeling approach that involves pretraining a Transformer to predict image tokens autoregressively.
We first propose multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity.
When trained on ImageNet at 256x256 resolution, we achieve Inception Score (IS) of 175.1 and Frechet Inception Distance (FID) of 4.17, a dramatic improvement over the vanilla VQGAN.
arXiv Detail & Related papers (2021-10-09T18:36:00Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Non-autoregressive Transformer-based End-to-end ASR using BERT [13.07939371864781]
This paper presents a transformer-based end-to-end automatic speech recognition (ASR) model based on BERT.
A series of experiments conducted on the AISHELL-1 dataset demonstrates competitive or superior results.
arXiv Detail & Related papers (2021-04-10T16:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.