: One LLM Token for Explicit Graph Structural Understanding
- URL: http://arxiv.org/abs/2602.01771v1
- Date: Mon, 02 Feb 2026 07:55:09 GMT
- Title: <SOG_k>: One LLM Token for Explicit Graph Structural Understanding
- Authors: Jingyao Wu, Bin Lu, Zijun Di, Xiaoying Gan, Meng Jin, Luoyi Fu, Xinbing Wang, Chenghu Zhou,
- Abstract summary: We propose to incorporate one special token SOG_k> to fully represent the Structure Of Graph within a unified token space.<n>SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner.
- Score: 57.017902343605364
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models show great potential in unstructured data understanding, but still face significant challenges with graphs due to their structural hallucination. Existing approaches mainly either verbalize graphs into natural language, which leads to excessive token consumption and scattered attention, or transform graphs into trainable continuous embeddings (i.e., soft prompt), but exhibit severe misalignment with original text tokens. To solve this problem, we propose to incorporate one special token <SOG_k> to fully represent the Structure Of Graph within a unified token space, facilitating explicit topology input and structural information sharing. Specifically, we propose a topology-aware structural tokenizer that maps each graph topology into a highly selective single token. Afterwards, we construct a set of hybrid structure Question-Answering corpora to align new structural tokens with existing text tokens. With this approach, <SOG_k> empowers LLMs to understand, generate, and reason in a concise and accurate manner. Extensive experiments on five graph-level benchmarks demonstrate the superiority of our method, achieving a performance improvement of 9.9% to 41.4% compared to the baselines while exhibiting interpretability and consistency. Furthermore, our method provides a flexible extension to node-level tasks, enabling both global and local structural understanding. The codebase is publicly available at https://github.com/Jingyao-Wu/SOG.
Related papers
- Tokenization, Fusion and Decoupling: Bridging the Granularity Mismatch Between Large Language Models and Knowledge Graphs [20.946228883628013]
We propose KGT, a novel framework that uses dedicated entity tokens to enable efficient, full-space prediction.<n>We first introduce specialized tokenization to construct feature representations at the level of dedicated entity tokens.<n>We then fuse pre-trained structural and textual features into these unified embeddings via a relation-guided gating mechanism.
arXiv Detail & Related papers (2026-02-26T07:20:40Z) - NAG: A Unified Native Architecture for Encoder-free Text-Graph Modeling in Language Models [33.49410203951687]
We argue this approach is suboptimal for text-graphs.<n>NAG (Native Architecture for Graphs) is a unified framework that internalizes graph processing within the Language Models.<n>NAG achieves robust graph comprehension without the overhead of external encoders.
arXiv Detail & Related papers (2026-01-30T07:22:11Z) - Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression [55.51959317490934]
Large language models (LLMs) have demonstrated promising capabilities in Text-Attributed Graph (TAG) understanding.<n>We argue that graphs inherently contain rich structural and semantic information, and that their effective exploitation can unlock potential gains in LLMs reasoning performance.<n>We propose Homophily-aware Structural and Semantic Compression for LLMs (HS2C), a framework centered on exploiting graph homophily.
arXiv Detail & Related papers (2026-01-13T03:35:18Z) - GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning [50.40400074353263]
Graph Neural Networks (GNNs) are powerful tools for precessing relational data but often struggle to generalize to unseen graphs.<n>We introduce textbfGraph textbfIn-context textbfL textbfTransformer (GILT), a framework built on an LLM-free and tuning-free architecture.
arXiv Detail & Related papers (2025-10-06T08:09:15Z) - Integrating Structural and Semantic Signals in Text-Attributed Graphs with BiGTex [0.0]
BiGTex is a novel architecture that tightly integrates GNNs and LLMs through stacked Graph-Text Fusion Units.<n>BiGTex achieves state-of-the-art performance in node classification and generalizes effectively to link prediction.
arXiv Detail & Related papers (2025-04-16T20:25:11Z) - Graph Self-Supervised Learning with Learnable Structural and Positional Encodings [39.20899720477907]
We introduce emphGenHopNet, a GNN framework that integrates a $k$-hop message-passing scheme.<n>We also propose a structural- and positional-aware GSSL framework that incorporates topological information throughout the learning process.<n>Our work significantly advances GSSL's capability in distinguishing graphs with similar local structures but different global topologies.
arXiv Detail & Related papers (2025-02-22T14:10:06Z) - Each Graph is a New Language: Graph Learning with LLMs [9.22463167477865]
We present textbfGraph-textbfDefined textbfLanguage for textbfLarge textbfLanguage textbfModel (GDL4LLM) to transfer powerful language understanding capabilities to graph-structured data.<n>GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures.
arXiv Detail & Related papers (2025-01-20T13:20:41Z) - Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing.
The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge.
It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z) - A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior.
Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks.
GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z) - GraphEdit: Large Language Models for Graph Structure Learning [14.16155596597421]
Graph Structure Learning (GSL) focuses on capturing intrinsic dependencies and interactions among nodes in graph-structured data.<n>Existing GSL methods heavily depend on explicit graph structural information as supervision signals.<n>We propose GraphEdit, an approach that leverages large language models (LLMs) to learn complex node relationships in graph-structured data.
arXiv Detail & Related papers (2024-02-23T08:29:42Z) - Self-organization Preserved Graph Structure Learning with Principle of
Relevant Information [72.83485174169027]
PRI-GSL is a Graph Structure Learning framework for identifying the self-organization and revealing the hidden structure.
PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence.
arXiv Detail & Related papers (2022-12-30T16:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.