Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs
- URL: http://arxiv.org/abs/2602.22698v1
- Date: Thu, 26 Feb 2026 07:20:40 GMT
- Title: Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs
- Authors: Siyue Su, Jian Yang, Bo Li, Guanglin Niu,
- Abstract summary: We propose KGT, a novel framework that uses dedicated entity tokens to enable efficient, full-space prediction.<n>We first introduce specialized tokenization to construct feature representations at the level of dedicated entity tokens.<n>We then fuse pre-trained structural and textual features into these unified embeddings via a relation-guided gating mechanism.
- Score: 20.946228883628013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Leveraging Large Language Models (LLMs) for Knowledge Graph Completion (KGC) is promising but hindered by a fundamental granularity mismatch. LLMs operate on fragmented token sequences, whereas entities are the fundamental units in knowledge graphs (KGs) scenarios. Existing approaches typically constrain predictions to limited candidate sets or align entities with the LLM's vocabulary by pooling multiple tokens or decomposing entities into fixed-length token sequences, which fail to capture both the semantic meaning of the text and the structural integrity of the graph. To address this, we propose KGT, a novel framework that uses dedicated entity tokens to enable efficient, full-space prediction. Specifically, we first introduce specialized tokenization to construct feature representations at the level of dedicated entity tokens. We then fuse pre-trained structural and textual features into these unified embeddings via a relation-guided gating mechanism, avoiding training from scratch. Finally, we implement decoupled prediction by leveraging independent heads to separate and combine semantic and structural reasoning. Experimental results show that KGT consistently outperforms state-of-the-art methods across multiple benchmarks.
Related papers
- <SOG_k>: One LLM Token for Explicit Graph Structural Understanding [57.017902343605364]
We propose to incorporate one special token SOG_k> to fully represent the Structure Of Graph within a unified token space.<n>SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner.
arXiv Detail & Related papers (2026-02-02T07:55:09Z) - Knowledge Graph Completion with Relation-Aware Anchor Enhancement [50.50944396454757]
We propose a relation-aware anchor enhanced knowledge graph completion method (RAA-KGC)<n>We first generate anchor entities within the relation-aware neighborhood of the head entity.<n>Then, by pulling the query embedding towards the neighborhoods of the anchors, it is tuned to be more discriminative for target entity matching.
arXiv Detail & Related papers (2025-04-08T15:22:08Z) - Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization [26.9153121765435]
Large language models (LLMs) have demonstrated remarkable in-context learning abilities.<n>This paper investigates how ICL emerges and the impact of pre-training phase on ICL.<n>Our theory is supported by experiments on numerical linear dynamic systems, synthetic GINC and real-world language datasets.
arXiv Detail & Related papers (2025-02-24T10:26:29Z) - Unifying Structure and Language Semantic for Efficient Contrastive
Knowledge Graph Completion with Structured Entity Anchors [0.3913403111891026]
The goal of knowledge graph completion (KGC) is to predict missing links in a KG using trained facts that are already known.
We propose a novel method to effectively unify structure information and language semantics without losing the power of inductive reasoning.
arXiv Detail & Related papers (2023-11-07T11:17:55Z) - SpEL: Structured Prediction for Entity Linking [5.112679200269861]
We revisit the use of structured prediction for entity linking which classifies each individual input token as an entity, and aggregates the token predictions.
Our system, called SpEL, is a state-of-the-art entity linking system that uses some new ideas to apply structured prediction to the task of entity linking.
Our experiments show that we can outperform the state-of-the-art on the commonly used AIDA benchmark dataset for entity linking to Wikipedia.
arXiv Detail & Related papers (2023-10-23T08:24:35Z) - Schema First! Learn Versatile Knowledge Graph Embeddings by Capturing
Semantics with MASCHInE [3.174882428337821]
Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years.
In this work, we design protographs -- small, modified versions of a KG that leverage RDF/S information.
The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics.
arXiv Detail & Related papers (2023-06-06T13:22:54Z) - Physics of Language Models: Part 1, Learning Hierarchical Language Structures [51.68385617116854]
Transformer-based language models are effective but complex, and understanding their inner workings and reasoning mechanisms is a significant challenge.<n>We introduce a family of synthetic CFGs that produce hierarchical rules, capable of generating lengthy sentences.<n>We demonstrate that generative models like GPT can accurately learn and reason over CFG-defined hierarchies and generate sentences based on it.
arXiv Detail & Related papers (2023-05-23T04:28:16Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - Inductive Learning on Commonsense Knowledge Graph Completion [89.72388313527296]
Commonsense knowledge graph (CKG) is a special type of knowledge graph (CKG) where entities are composed of free-form text.
We propose to study the inductive learning setting for CKG completion where unseen entities may present at test time.
InductivE significantly outperforms state-of-the-art baselines in both standard and inductive settings on ATOMIC and ConceptNet benchmarks.
arXiv Detail & Related papers (2020-09-19T16:10:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.