GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation
- URL: http://arxiv.org/abs/2412.11180v1
- Date: Sun, 15 Dec 2024 13:18:56 GMT
- Title: GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation
- Authors: Ziang Zhou, Zhihao Ding, Jieming Shi, Qing Li, Shiqi Shen,
- Abstract summary: Graph Neural Networks (GNNs) are fundamental to graph-based learning and excel in node classification tasks.
Recent studies attempt to distill GNNs into multi-layer perceptrons (MLPs) for faster inference.
We propose TINED, a novel method that distills GNNs tos layer-wise through Teacher Injection with fine-tuning and Dirichlet Energy Distillation techniques.
- Score: 9.118347325106496
- License:
- Abstract: Graph Neural Networks (GNNs) are fundamental to graph-based learning and excel in node classification tasks. However, GNNs suffer from scalability issues due to the need for multi-hop data during inference, limiting their use in latency-sensitive applications. Recent studies attempt to distill GNNs into multi-layer perceptrons (MLPs) for faster inference. They typically treat GNN and MLP models as single units for distillation, insufficiently utilizing the fine-grained knowledge within GNN layers. In this paper, we propose TINED, a novel method that distills GNNs to MLPs layer-wise through Teacher Injection with fine-tuning and Dirichlet Energy Distillation techniques. We analyze key operations in GNN layers, feature transformation (FT) and graph propagation (GP), and identify that an FT performs the same computation as a fully-connected (FC) layer in MLPs. Thus, we propose directly injecting valuable teacher parameters of an FT in a GNN into an FC layer of the student MLP, assisted by fine-tuning. In TINED, FC layers in an MLP mirror the order of the corresponding FTs and GPs in GNN. We provide a theoretical bound on the approximation of GPs. Moreover, we observe that in a GNN layer, FT and GP operations often have opposing smoothing effects: GP is aggressive, while FT is conservative, in smoothing. Using Dirichlet energy, we design a DE ratio to quantify these smoothing effects and propose Dirichlet Energy Distillation to distill these characteristics from GNN layers to MLP layers. Extensive experiments demonstrate that TINED achieves superior performance over GNNs and state-of-the-art distillation methods under various settings across seven datasets. The code is in supplementary material.
Related papers
- Learning Accurate, Efficient, and Interpretable MLPs on Multiplex Graphs via Node-wise Multi-View Ensemble Distillation [29.70212915594676]
Multiplex Graph Neural Networks (MGNNs) have achieved advanced performance in various downstream tasks.
We propose Multiplex Graph-Free Graph Neural Networks (MGFNNs) and MGFNN+ to combine MGNNs' superior performance and Neural Networks' efficient inference.
arXiv Detail & Related papers (2025-02-09T11:55:14Z) - Large-Scale Spectral Graph Neural Networks via Laplacian Sparsification: Technical Report [21.288230563135055]
We propose a novel graph spectral sparsification method to approximate the propagation patterns of spectral Graph Neural Networks (GNNs)
Our method allows the application of linear layers on the input node features, enabling end-to-end training as well as the handling of raw features.
arXiv Detail & Related papers (2025-01-08T15:36:19Z) - Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference [53.38082028252104]
We introduce HG2M and HG2M+ to combine both HGNN's superior performance and relational's efficient inference.
HG2M directly trains students with node features as input and soft labels from teacher HGNNs as targets.
HG2Ms demonstrate a 379.24$times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset.
arXiv Detail & Related papers (2024-11-21T11:39:09Z) - Spiking Graph Neural Network on Riemannian Manifolds [51.15400848660023]
Graph neural networks (GNNs) have become the dominant solution for learning on graphs.
Existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry.
We present a Manifold-valued Spiking GNN (MSG)
MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.
arXiv Detail & Related papers (2024-10-23T15:09:02Z) - A Teacher-Free Graph Knowledge Distillation Framework with Dual
Self-Distillation [58.813991312803246]
We propose a Teacher-Free Graph Self-Distillation (TGS) framework that does not require any teacher model or GNNs during both training and inference.
TGS enjoys the benefits of graph topology awareness in training but is free from data dependency in inference.
arXiv Detail & Related papers (2024-03-06T05:52:13Z) - VQGraph: Rethinking Graph Representation Space for Bridging GNNs and
MLPs [97.63412451659826]
VQGraph learns a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code.
VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings.
arXiv Detail & Related papers (2023-08-04T02:58:08Z) - Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and
Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework [36.160251860788314]
We propose an efficient Full-Frequency GNN-to-MLP (FFG2M) distillation framework.
We factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain.
We identify a potential information drowning problem for existing GNN-to-MLP distillation.
arXiv Detail & Related papers (2023-05-18T06:57:06Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.