Related papers: Pre-training Transformers for Knowledge Graph Completion

Pre-training Transformers for Knowledge Graph Completion

URL: http://arxiv.org/abs/2303.15682v1
Date: Tue, 28 Mar 2023 02:10:37 GMT
Title: Pre-training Transformers for Knowledge Graph Completion
Authors: Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji and Jianfeng Gao
Abstract summary: We introduce a novel inductive KG representation model (iHT) for learning transferable representation for knowledge graphs. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models.
Score: 81.4078733132239
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning transferable representation of knowledge graphs (KGs) is challenging due to the heterogeneous, multi-relational nature of graph structures. Inspired by Transformer-based pretrained language models' success on learning transferable representation for texts, we introduce a novel inductive KG representation model (iHT) for KG completion by large-scale pre-training. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. We first pre-train iHT on a large KG dataset, Wikidata5M. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models. When further fine-tuned on smaller KGs with either entity and relational shifts, pre-trained iHT representations are shown to be transferable, significantly improving the performance on FB15K-237 and WN18RR.

Related papers

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought [46.71030329872635]
Chain of Thought (CoT) prompting has been shown to significantly improve the performance of large language models (LLMs) We study the training dynamics of transformers over a CoT objective on an in-context weight prediction task for linear regression.
arXiv Detail & Related papers (2025-02-28T16:40:38Z)
Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling [33.04380466268661]
Hyper-relational knowledge graph (HKG) that generalizes triple-based knowledge graph (KG) has been attracting research attention recently. To model HKG, existing studies mainly focus on either semantic information or structural information therein. We propose an equivalent transformation for HKG modeling, referred to as TransEQ.
arXiv Detail & Related papers (2024-11-09T14:16:41Z)
Learning and Transferring Sparse Contextual Bigrams with Linear Transformers [47.37256334633102]
We introduce the Sparse Con Bigram model, where the next token's generation depends on a sparse set of earlier positions determined by the last token. We analyze the training dynamics and sample complexity of learning SCB using a one-layer linear transformer with a gradient-based algorithm. We prove that, provided a nontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allows us to bypass the initial sample-intensive stage.
arXiv Detail & Related papers (2024-10-30T20:29:10Z)
Learning Graph Quantized Tokenizers [28.79505338383552]
Graph Transformers (GTs) have emerged as leading models in geometric deep learning. We introduce GQT (textbfGraph textbfQuantized textbfTokenizer), which decouples tokenizer training from Transformer training. By combining the GQT with token modulation, a Transformer encoder achieves state-of-the-art performance on 20 out of 22 benchmarks.
arXiv Detail & Related papers (2024-10-17T17:38:24Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Semantic-visual Guided Transformer for Few-shot Class-incremental Learning [6.300141694311465]
We develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes. Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone.
arXiv Detail & Related papers (2023-03-27T15:06:49Z)
Learning to Grow Pretrained Models for Efficient Transformer Training [72.20676008625641]
We learn to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. Experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch.
arXiv Detail & Related papers (2023-03-02T05:21:18Z)
Link-Intensive Alignment for Incomplete Knowledge Graphs [28.213397255810936]
In this work, we address the problem of aligning incomplete KGs with representation learning. Our framework exploits two feature channels: transitivity-based and proximity-based. The two feature channels are jointly learned to exchange important features between the input KGs. Also, we develop a missing links detector that discovers and recovers the missing links during the training process.
arXiv Detail & Related papers (2021-12-17T00:41:28Z)
Vector-quantized Image Modeling with Improved VQGAN [93.8443646643864]
We propose a Vector-quantized Image Modeling approach that involves pretraining a Transformer to predict image tokens autoregressively. We first propose multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. When trained on ImageNet at 256x256 resolution, we achieve Inception Score (IS) of 175.1 and Frechet Inception Distance (FID) of 4.17, a dramatic improvement over the vanilla VQGAN.
arXiv Detail & Related papers (2021-10-09T18:36:00Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.