An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment
- URL: http://arxiv.org/abs/2509.00560v1
- Date: Sat, 30 Aug 2025 16:53:42 GMT
- Title: An Efficient GNNs-to-KANs Distillation via Self-Attention Dynamic Sampling with Potential for Consumer Electronics Edge Deployment
- Authors: Can Cui, Zilong Fu, Penghe Huang, Yuanyuan Li, Wu Deng, Dongyan Li,
- Abstract summary: This paper introduces an innovative framework for knowledge distillation from Graph Networks to Kolmogorov-Arnold Networks (KANs)<n>Through the incorporation of learnable frequency bases and phase-shift mechanisms, FR-KAN significantly improves its nonlinear fitting capability while effectively reducing computational complexity.<n>Experiments conducted on six real-world datasets demonstrate that SA-DSD achieves performance improvements of 3.05%-3.62% over three GNN teacher models and 15.61% over the FR-KAN+ model.
- Score: 12.385364522094612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge distillation (KD) is crucial for deploying deep learning models in resource-constrained edge environments, particularly within the consumer electronics sector, including smart home devices, wearable technology, and mobile terminals. These applications place higher demands on model compression and inference speed, necessitating the transfer of knowledge from Graph Neural Networks (GNNs) to more efficient Multi-Layer Perceptron (MLP) models. However, due to their fixed activation functions and fully connected architecture, MLPs face challenges in rapidly capturing the complex neighborhood dependencies learned by GNNs, thereby limiting their performance in edge environments. To address these limitations, this paper introduces an innovative from GNNs to Kolmogorov-Arnold Networks (KANs) knowledge distillation framework-Self Attention Dynamic Sampling Distillation (SA-DSD). This study improved Fourier KAN (FR-KAN) and replaced MLP with the improved FR-KAN+ as the student model. Through the incorporation of learnable frequency bases and phase-shift mechanisms, along with algorithmic optimization, FR-KAN significantly improves its nonlinear fitting capability while effectively reducing computational complexity. Building on this, a margin-level sampling probability matrix, based on teacher-student prediction consistency, is constructed, and an adaptive weighted loss mechanism is designed to mitigate performance degradation in the student model due to the lack of explicit neighborhood aggregation. Extensive experiments conducted on six real-world datasets demonstrate that SA-DSD achieves performance improvements of 3.05%-3.62% over three GNN teacher models and 15.61% over the FR-KAN+ model. Moreover, when compared with key benchmark models, SA-DSD achieves a 16.96x reduction in parameter count and a 55.75% decrease in inference time.
Related papers
- SAFA-SNN: Sparsity-Aware On-Device Few-Shot Class-Incremental Learning with Fast-Adaptive Structure of Spiking Neural Network [19.73335869722781]
Continuous learning of novel classes is crucial for edge devices to preserve data privacy and maintain reliable performance in dynamic environments.<n>In this work, we present an SNN-based method for On-Device FSCIL ie., Sparsity-Aware and Fast Adaptive SNN.
arXiv Detail & Related papers (2025-10-04T03:21:31Z) - Deep Hierarchical Learning with Nested Subspace Networks [53.71337604556311]
We propose Nested Subspace Networks (NSNs) for large neural networks.<n>NSNs enable a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets.<n>We show that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier.
arXiv Detail & Related papers (2025-09-22T15:13:14Z) - NIRVANA: Structured pruning reimagined for large language models compression [50.651730342011014]
We introduce NIRVANA, a novel pruning method designed to balance immediate zero-shot preservation accuracy with robust fine-tuning.<n>To further address the unique challenges posed by structured pruning, NIRVANA incorporates an adaptive sparsity allocation mechanism across layers and modules.<n>Experiments conducted on Llama3, Qwen, T5 models demonstrate that NIRVANA outperforms existing structured pruning methods under equivalent sparsity constraints.
arXiv Detail & Related papers (2025-09-17T17:59:00Z) - A Lightweight Deep Learning Model for Automatic Modulation Classification using Dual Path Deep Residual Shrinkage Network [0.0]
Automatic Modulation Classification (AMC) plays a key role in enhancing spectrum efficiency.<n>There is a pressing need for lightweight AMC models that balance low complexity with high classification accuracy.<n>This paper proposes a low-complexity, lightweight deep learning (DL) AMC model optimized for resource-constrained edge devices.
arXiv Detail & Related papers (2025-07-07T00:37:54Z) - Auto-Compressing Networks [59.83547898874152]
We introduce Auto- Networks (ACNs), an architectural variant where additive long feedforward connections from each layer replace traditional short residual connections.<n>ACNs showcase unique property we coin as "auto-compression", the ability of a network to organically compress information during training.<n>We find that ACNs exhibit enhanced noise robustness compared to residual networks, superior performance in low-data settings, and mitigate catastrophic forgetting.
arXiv Detail & Related papers (2025-06-11T13:26:09Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - DA-LIF: Dual Adaptive Leaky Integrate-and-Fire Model for Deep Spiking Neural Networks [5.832445095443944]
Spiking Neural Networks (SNNs) are valued for their ability to process-temporal information efficiently.<n>We propose the Dual Leaky Integrate-and-Fire (DA-LIF) model, which introduces spatial and temporal tuning with independently learnable decays.
arXiv Detail & Related papers (2025-02-05T09:02:07Z) - Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing [3.379854610429579]
Recurrent Large Language Models (R-LLM) have proven effective in mitigating the complexity of self-attention.<n>We propose a low-cost, training-free algorithm to sparsify R-LLMs' activations to enhance energy efficiency on neuromorphic hardware.
arXiv Detail & Related papers (2025-01-09T19:13:03Z) - HM-DF SNN: Transcending Conventional Online Learning with Advanced Training and Deployment [39.6783548791379]
Spiking Neural Networks (SNNs) are considered to have enormous potential in the future development of Artificial Intelligence.<n>Current online learning framework cannot tackle the inseparability problem of temporal dependent gradients.<n>We propose Hybrid Mechanism-Driven Firing (HM-DF) model, which is a family of advanced models that respectively adopt different spiking calculation schemes.
arXiv Detail & Related papers (2024-10-10T02:39:22Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - lpSpikeCon: Enabling Low-Precision Spiking Neural Network Processing for
Efficient Unsupervised Continual Learning on Autonomous Agents [14.916996986290902]
We propose lpSpikeCon, a novel methodology to enable low-precision SNN processing for efficient unsupervised continual learning.
Our lpSpikeCon can reduce weight memory of the SNN model by 8x (i.e., by judiciously employing 4-bit weights) for performing online training with unsupervised continual learning.
arXiv Detail & Related papers (2022-05-24T18:08:16Z) - Towards Efficient Point Cloud Graph Neural Networks Through
Architectural Simplification [8.062534763028808]
We make a step towards improving the efficiency of graph neural network (GNN) models by making the observation that these GNN models are heavily limited by the representational power of their first, feature extracting, layer.
We find that it is possible to radically simplify these models so long as the feature extraction layer is retained with minimal degradation to model performance.
Our approach reduces memory consumption by 20$times$ and latency by up to 9.9$times$ for graph layers in models such as DGCNN; overall, we achieve speed-ups of up to 4.5$times$ and peak memory reductions of
arXiv Detail & Related papers (2021-08-13T17:04:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.