Related papers: Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries

Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries

URL: http://arxiv.org/abs/2208.07638v1
Date: Tue, 16 Aug 2022 09:51:26 GMT
Title: Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries
Authors: Xiao Liu, Shiyu Zhao, Kai Su, Yukuo Cen, Jiezhong Qiu, Mengdi Zhang, Wei Wu, Yuxiao Dong, Jie Tang
Abstract summary: We present the Knowledge Graph Transformer (kgTransformer) with masked pre-training and fine-tuning strategies. kgTransformer can consistently outperform both KG embedding-based baselines and advanced encoders on nine in-domain and out-of-domain reasoning tasks.
Score: 36.22117601006972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge graph (KG) embeddings have been a mainstream approach for reasoning over incomplete KGs. However, limited by their inherently shallow and static architectures, they can hardly deal with the rising focus on complex logical queries, which comprise logical operators, imputed edges, multiple source entities, and unknown intermediate entities. In this work, we present the Knowledge Graph Transformer (kgTransformer) with masked pre-training and fine-tuning strategies. We design a KG triple transformation method to enable Transformer to handle KGs, which is further strengthened by the Mixture-of-Experts (MoE) sparse activation. We then formulate the complex logical queries as masked prediction and introduce a two-stage masked pre-training strategy to improve transferability and generalizability. Extensive experiments on two benchmarks demonstrate that kgTransformer can consistently outperform both KG embedding-based baselines and advanced encoders on nine in-domain and out-of-domain reasoning tasks. Additionally, kgTransformer can reason with explainability via providing the full reasoning paths to interpret given answers.

Related papers

Simplifying Graph Transformers [64.50059165186701]
We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
arXiv Detail & Related papers (2025-04-17T02:06:50Z)
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs [38.517345561999115]
SymAgent is an innovative neural-symbolic agent framework that achieves collaborative augmentation between Knowledge Graphs and Large Language Models. We conceptualize KGs as dynamic environments and transform complex reasoning tasks into a multi-step interactive process, enabling KGs to participate deeply in the reasoning process.
arXiv Detail & Related papers (2025-02-05T15:37:05Z)
Enhancing Transformers for Generalizable First-Order Logical Entailment [51.04944136538266]
This paper investigates the generalizable first-order logical reasoning ability of transformers with their parameterized knowledge. The first-order reasoning capability of transformers is assessed through their ability to perform first-order logical entailment. We propose a more sophisticated, logic-aware architecture, TEGA, to enhance the capability for generalizable first-order logical entailment in transformers.
arXiv Detail & Related papers (2025-01-01T07:05:32Z)
Unraveling the Gradient Descent Dynamics of Transformers [37.096572564254515]
Gradient Descent (GD) can train a Transformer model to achieve a global optimal solution, especially when the input embedding dimension is large. We analyze the loss landscape of a single Transformer layer using Softmax and Gaussian attention kernels.
arXiv Detail & Related papers (2024-11-12T04:33:56Z)
Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling [33.04380466268661]
Hyper-relational knowledge graph (HKG) that generalizes triple-based knowledge graph (KG) has been attracting research attention recently. To model HKG, existing studies mainly focus on either semantic information or structural information therein. We propose an equivalent transformation for HKG modeling, referred to as TransEQ.
arXiv Detail & Related papers (2024-11-09T14:16:41Z)
Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains [66.55612528039894]
Knowledge Graphs (KGs) can serve as reliable knowledge sources for question answering (QA) We present DoG (Decoding on Graphs), a novel framework that facilitates a deep synergy between LLMs and KGs. Experiments across various KGQA tasks with different background KGs demonstrate that DoG achieves superior and robust performance.
arXiv Detail & Related papers (2024-10-24T04:01:40Z)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent [51.50999191584981]
Sign Gradient Descent (SignGD) serves as an effective surrogate for Adam. We study how SignGD optimize a two-layer transformer on a noisy dataset. We find that the poor generalization of SignGD is not solely due to data noise, suggesting that both SignGD and Adam requires high-quality data for real-world tasks.
arXiv Detail & Related papers (2024-10-07T09:36:43Z)
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory [11.3128832831327]
Increasing the size of a Transformer model does not always lead to enhanced performance. improved generalization ability occurs as the model memorizes the training samples. We present a theoretical framework that sheds light on the memorization process and performance dynamics of transformer-based language models.
arXiv Detail & Related papers (2024-05-14T15:48:36Z)
On the Convergence of Encoder-only Shallow Transformers [62.639819460956176]
We build the global convergence theory of encoder-only shallow Transformers under a realistic setting. Our results can pave the way for a better understanding of modern Transformers, particularly on training dynamics.
arXiv Detail & Related papers (2023-11-02T20:03:05Z)
Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs [67.043747188954]
We propose a structure-modeled textual encoding framework for inductive logical reasoning over KGs. It encodes linearized query structures and entities using pre-trained language models to find answers. We conduct experiments on two inductive logical reasoning datasets and three transductive datasets.
arXiv Detail & Related papers (2023-05-23T01:25:29Z)
Pre-training Transformers for Knowledge Graph Completion [81.4078733132239]
We introduce a novel inductive KG representation model (iHT) for learning transferable representation for knowledge graphs. iHT consists of a entity encoder (e.g., BERT) and a neighbor-aware relational scoring function both parameterized by Transformers. Our approach achieves new state-of-the-art results on matched evaluations, with a relative improvement of more than 25% in mean reciprocal rank over previous SOTA models.
arXiv Detail & Related papers (2023-03-28T02:10:37Z)
BiT: Robustly Binarized Multi-distilled Transformer [36.06192421902272]
We develop binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline within as little as 5.9%. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline within as little as 5.9%.
arXiv Detail & Related papers (2022-05-25T19:01:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.