ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs
- URL: http://arxiv.org/abs/2507.19031v1
- Date: Fri, 25 Jul 2025 07:35:09 GMT
- Title: ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs
- Authors: Weigang Lu, Ziyu Guan, Wei Zhao, Yaming Yang, Yujie Sun, Zheng Liang, Yibing Zhan, Dapeng Tao,
- Abstract summary: We introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge.<n>Our approach is validated through comprehensive experiments on eight real-world graph datasets.
- Score: 42.37669895235534
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: GNN-to-MLP (G2M) methods have emerged as a promising approach to accelerate Graph Neural Networks (GNNs) by distilling their knowledge into simpler Multi-Layer Perceptrons (MLPs). These methods bridge the gap between the expressive power of GNNs and the computational efficiency of MLPs, making them well-suited for resource-constrained environments. However, existing G2M methods are limited by their inability to flexibly adjust inference cost and accuracy dynamically, a critical requirement for real-world applications where computational resources and time constraints can vary significantly. To address this, we introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge distillation (ProGMLP). ProGMLP employs a Progressive Training Structure (PTS), where multiple MLP students are trained in sequence, each building on the previous one. Furthermore, ProGMLP incorporates Progressive Knowledge Distillation (PKD) to iteratively refine the distillation process from GNNs to MLPs, and Progressive Mixup Augmentation (PMA) to enhance generalization by progressively generating harder mixed samples. Our approach is validated through comprehensive experiments on eight real-world graph datasets, demonstrating that ProGMLP maintains high accuracy while dynamically adapting to varying runtime scenarios, making it highly effective for deployment in diverse application settings.
Related papers
- Heuristic Methods are Good Teachers to Distill MLPs for Graph Link Prediction [61.70012924088756]
Distilling Graph Neural Networks (GNNs) teachers into Multi-Layer Perceptrons (MLPs) students has emerged as an effective approach to achieve strong performance.<n>However, existing distillation methods only use standard GNNs and overlook alternative teachers such as specialized model for link prediction (GNN4LP) and methods (e.g., common neighbors)<n>This paper first explores the impact of different teachers in GNN-to-MLP distillation, we find that stronger teachers do not always produce stronger students, while weaker methods can teachs to near-GNN performance with drastically reduced training costs
arXiv Detail & Related papers (2025-04-08T16:35:11Z) - Teaching MLPs to Master Heterogeneous Graph-Structured Knowledge for Efficient and Accurate Inference [53.38082028252104]
We introduce HG2M and HG2M+ to combine both HGNN's superior performance and relational's efficient inference.<n> HG2M directly trains students with node features as input and soft labels from teacher HGNNs as targets.<n> HG2Ms demonstrate a 379.24$times$ speedup in inference over HGNNs on the large-scale IGB-3M-19 dataset.
arXiv Detail & Related papers (2024-11-21T11:39:09Z) - AdaGMLP: AdaBoosting GNN-to-MLP Knowledge Distillation [15.505402580010104]
A new wave of methods, collectively known as GNN-to-MLP Knowledge Distillation, has emerged.
They aim to transfer GNN-learned knowledge to a more efficient student.
These methods face challenges in situations with insufficient training data and incomplete test data.
We propose AdaGMLP, an AdaBoosting GNN-to-MLP Knowledge Distillation framework.
arXiv Detail & Related papers (2024-05-23T08:28:44Z) - MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models [33.86069537521178]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications.<n>General approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning.<n>We propose one-shot compression techniques specifically designed for fine-tuning.
arXiv Detail & Related papers (2023-07-18T03:12:51Z) - Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP [46.52398427166938]
One promising inference acceleration direction is to distill the GNNs into message-passing-free student multi-layer perceptrons.
We introduce a novel structure-mixing knowledge strategy to enhance the learning ability of students for structure information.
Our SA-MLP can consistently outperform the teacher GNNs, while maintaining faster inference assitance.
arXiv Detail & Related papers (2022-10-18T05:55:36Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z) - Using Fitness Dependent Optimizer for Training Multi-layer Perceptron [13.280383503879158]
This study presents a novel training algorithm depending upon the recently proposed Fitness Dependent (FDO)
The stability of this algorithm has been verified and performance-proofed in both the exploration and exploitation stages.
The proposed approach using FDO as a trainer can outperform the other approaches using different trainers on the dataset.
arXiv Detail & Related papers (2022-01-03T10:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.