Related papers: Enhancing Transformer with GNN Structural Knowledge via Distillation: A Novel Approach

Enhancing Transformer with GNN Structural Knowledge via Distillation: A Novel Approach

URL: http://arxiv.org/abs/2503.01888v1
Date: Thu, 27 Feb 2025 05:14:47 GMT
Title: Enhancing Transformer with GNN Structural Knowledge via Distillation: A Novel Approach
Authors: Zhihua Duan, Jialin Wang,
Abstract summary: This paper proposes a novel knowledge distillation framework that transfers multiscale structural knowledge from GNN teacher models to Transformer student models.<n>The framework effectively bridges the architectural gap between GNNs and Transformers through micro-macro distillation losses and multiscale feature alignment.
Score: 1.4582633500696451
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Integrating the structural inductive biases of Graph Neural Networks (GNNs) with the global contextual modeling capabilities of Transformers represents a pivotal challenge in graph representation learning. While GNNs excel at capturing localized topological patterns through message-passing mechanisms, their inherent limitations in modeling long-range dependencies and parallelizability hinder their deployment in large-scale scenarios. Conversely, Transformers leverage self-attention mechanisms to achieve global receptive fields but struggle to inherit the intrinsic graph structural priors of GNNs. This paper proposes a novel knowledge distillation framework that systematically transfers multiscale structural knowledge from GNN teacher models to Transformer student models, offering a new perspective on addressing the critical challenges in cross-architectural distillation. The framework effectively bridges the architectural gap between GNNs and Transformers through micro-macro distillation losses and multiscale feature alignment. This work establishes a new paradigm for inheriting graph structural biases in Transformer architectures, with broad application prospects.

Related papers

Evolutionary Router Feature Generation for Zero-Shot Graph Anomaly Detection with Mixture-of-Experts [60.60414602796664]
We propose a novel MoE framework with evolutionary router feature generation (EvoFG) for zero-shot GAD.<n>EvoFG consistently outperforms state-of-the-art baselines, achieving strong and stable zero-shot GAD performance.
arXiv Detail & Related papers (2026-02-12T06:16:51Z)
Plain Transformers are Surprisingly Powerful Link Predictors [57.01966734467712]
Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies.<n>While Graph Neural Networks (GNNs) are the standard solution, state-of-the-art pipelines often rely on explicit structurals or memory-intensive node embeddings.<n>We present PENCIL, an encoder-only plain Transformer that replaces hand-crafted priors with attention over sampled local subgraphs.
arXiv Detail & Related papers (2026-02-02T02:45:52Z)
Parameter-Free Structural-Diversity Message Passing for Graph Neural Networks [8.462209415744098]
Graph Neural Networks (GNNs) have shown remarkable performance in structured data modeling tasks such as node classification.<n>This paper proposes a parameter-free graph neural network framework based on structural diversity.<n>The framework is inspired by structural diversity theory and designs a unified structural-diversity message passing mechanism.
arXiv Detail & Related papers (2025-08-27T13:42:45Z)
Graded Transformers: A Symbolic-Geometric Approach to Structured Learning [0.0]
We introduce a novel class of sequence models that embed inductive biases through grading transformations on vector spaces.<n>The Graded Transformer holds transformative potential for hierarchical learning and neurosymbolic reasoning.<n>This work advances structured deep learning by fusing geometric and algebraic principles with attention mechanisms.
arXiv Detail & Related papers (2025-07-27T02:34:08Z)
DuoFormer: Leveraging Hierarchical Representations by Local and Global Attention Vision Transformer [1.456352735394398]
We propose a novel hierarchical transformer model that adeptly integrates the feature extraction capabilities of Convolutional Neural Networks (CNNs) with the advanced representational potential of Vision Transformers (ViTs)<n> Addressing the lack of inductive biases and dependence on extensive training datasets in ViTs, our model employs a CNN backbone to generate hierarchical visual representations.<n>These representations are adapted for transformer input through an innovative patch tokenization process, preserving the inherited multi-scale inductive biases.
arXiv Detail & Related papers (2025-06-15T22:42:57Z)
Global graph features unveiled by unsupervised geometric deep learning [0.0]
We introduce GAUDI (Graph Autoencoder Uncovering Descriptive Information), a novel geometric unsupervised deep learning framework. GAUDI employs an innovative hourglass architecture with hierarchical pooling and upsampling layers, linked through skip connections to preserve connectivity information. We demonstrate its power across multiple applications, including modeling small-world networks, characterizing assemblies from super-resolution microscopy, analyzing collective motion in the Vicsek model, and capturing age changes in brain connectivity.
arXiv Detail & Related papers (2025-03-07T16:38:41Z)
Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers [0.0]
We reformulate the Transformer's attention mechanism as a graph operation.<n>We introduce Sparse GIN-Attention, a fine-tuning approach that employs sparse GINs.
arXiv Detail & Related papers (2025-01-04T22:30:21Z)
SGFormer: Single-Layer Graph Transformers with Approximation-Free Linear Complexity [74.51827323742506]
We evaluate the necessity of adopting multi-layer attentions in Transformers on graphs. We show that one-layer propagation can be reduced to one-layer propagation, with the same capability for representation learning. It suggests a new technical path for building powerful and efficient Transformers on graphs.
arXiv Detail & Related papers (2024-09-13T17:37:34Z)
Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge. It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z)
A Pure Transformer Pretraining Framework on Text-attributed Graphs [50.833130854272774]
We introduce a feature-centric pretraining perspective by treating graph structure as a prior. Our framework, Graph Sequence Pretraining with Transformer (GSPT), samples node contexts through random walks. GSPT can be easily adapted to both node classification and link prediction, demonstrating promising empirical success on various datasets.
arXiv Detail & Related papers (2024-06-19T22:30:08Z)
Automatic Graph Topology-Aware Transformer [50.2807041149784]
We build a comprehensive graph Transformer search space with the micro-level and macro-level designs. EGTAS evolves graph Transformer topologies at the macro level and graph-aware strategies at the micro level. We demonstrate the efficacy of EGTAS across a range of graph-level and node-level tasks.
arXiv Detail & Related papers (2024-05-30T07:44:31Z)
Todyformer: Towards Holistic Dynamic Graph Transformers with Structure-Aware Tokenization [6.799413002613627]
Todyformer is a novel Transformer-based neural network tailored for dynamic graphs. It unifies the local encoding capacity of Message-Passing Neural Networks (MPNNs) with the global encoding of Transformers. We show that Todyformer consistently outperforms the state-of-the-art methods for downstream tasks.
arXiv Detail & Related papers (2024-02-02T23:05:30Z)
SGFormer: Simplifying and Empowering Transformers for Large-Graph Representations [75.71298846760303]
We show that a one-layer attention can bring up surprisingly competitive performance across node property prediction benchmarks. We frame the proposed scheme as Simplified Graph Transformers (SGFormer), which is empowered by a simple attention model. We believe the proposed methodology alone enlightens a new technical path of independent interest for building Transformers on large graphs.
arXiv Detail & Related papers (2023-06-19T08:03:25Z)
Dynamic Graph Representation Learning via Edge Temporal States Modeling and Structure-reinforced Transformer [5.093187534912688]
We introduce the Recurrent Structure-reinforced Graph Transformer (RSGT), a novel framework for dynamic graph representation learning. RSGT captures temporal node representations encoding both graph topology and evolving dynamics through a recurrent learning paradigm. We show RSGT's superior performance in discrete dynamic graph representation learning, consistently outperforming existing methods in dynamic link prediction tasks.
arXiv Detail & Related papers (2023-04-20T04:12:50Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.