Related papers: Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks

URL: http://arxiv.org/abs/2504.20445v1
Date: Tue, 29 Apr 2025 05:36:32 GMT
Title: Head-Tail-Aware KL Divergence in Knowledge Distillation for Spiking Neural Networks
Authors: Tianqing Zhang, Zixin Zhu, Kairong Yu, Hongwei Wang,
Abstract summary: Spiking Neural Networks (SNNs) have emerged as a promising approach for energy-efficient computation.<n>SNNs often exhibit a performance gap when compared to Artificial Neural Networks (ANNs)
Score: 4.943844247308908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spiking Neural Networks (SNNs) have emerged as a promising approach for energy-efficient and biologically plausible computation. However, due to limitations in existing training methods and inherent model constraints, SNNs often exhibit a performance gap when compared to Artificial Neural Networks (ANNs). Knowledge distillation (KD) has been explored as a technique to transfer knowledge from ANN teacher models to SNN student models to mitigate this gap. Traditional KD methods typically use Kullback-Leibler (KL) divergence to align output distributions. However, conventional KL-based approaches fail to fully exploit the unique characteristics of SNNs, as they tend to overemphasize high-probability predictions while neglecting low-probability ones, leading to suboptimal generalization. To address this, we propose Head-Tail Aware Kullback-Leibler (HTA-KL) divergence, a novel KD method for SNNs. HTA-KL introduces a cumulative probability-based mask to dynamically distinguish between high- and low-probability regions. It assigns adaptive weights to ensure balanced knowledge transfer, enhancing the overall performance. By integrating forward KL (FKL) and reverse KL (RKL) divergence, our method effectively align both head and tail regions of the distribution. We evaluate our methods on CIFAR-10, CIFAR-100 and Tiny ImageNet datasets. Our method outperforms existing methods on most datasets with fewer timesteps.

Related papers

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability [16.957071012748454]
Kolmogorov-Arnold Neural Networks (KANs) have gained significant attention in the machine learning community. However, their implementation often suffers from poor training stability and heavy trainable parameter. In this work, we analyze the behavior of KANs through the lens of spline knots and derive the lower and upper bound for the number of knots in B-spline-based KANs.
arXiv Detail & Related papers (2025-01-16T04:12:05Z)
Discriminative and Consistent Representation Distillation [6.24302896438145]
Discriminative and Consistent Distillation (DCD)<n>DCD employs a contrastive loss along with a consistency regularization to minimize the discrepancy between the distributions of teacher and student representations.<n>Our method introduces learnable temperature and bias parameters that adapt during training to balance these complementary objectives.
arXiv Detail & Related papers (2024-07-16T14:53:35Z)
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation [20.34272550256856]
Spiking neural networks (SNNs) mimic biological neural system to convey information via discrete spikes. Our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets.
arXiv Detail & Related papers (2024-07-12T08:17:24Z)
CADE: Cosine Annealing Differential Evolution for Spiking Neural Network [3.933578042941731]
Spiking neural networks (SNNs) have gained prominence for their potential in neuromorphic computing and energy-efficient artificial intelligence. This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE) CADE modulates the mutation factor (F) and crossover rate (CR) of differential evolution for the SNN model, i.e., Spiking Element Wise (SEW) ResNet.
arXiv Detail & Related papers (2024-06-04T14:24:35Z)
Accelerated Linearized Laplace Approximation for Bayesian Deep Learning [34.81292720605279]
We develop a Nystrom approximation to neural tangent kernels (NTKs) to accelerate LLA. Our method benefits from the capability of popular deep learning libraries for forward mode automatic differentiation. Our method can even scale up to architectures like vision transformers.
arXiv Detail & Related papers (2022-10-23T07:49:03Z)
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret [97.73233271730616]
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies) DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR) We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
arXiv Detail & Related papers (2022-06-08T18:43:45Z)
Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network. In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z)
Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware. It is a challenge to efficiently train SNNs due to their non-differentiability. We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z)
Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation [51.66271681532262]
Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
arXiv Detail & Related papers (2021-08-26T14:01:04Z)
Ensembles of Spiking Neural Networks [0.3007949058551534]
This paper demonstrates how to construct ensembles of spiking neural networks producing state-of-the-art results. We achieve classification accuracies of 98.71%, 100.0%, and 99.09%, on the MNIST, NMNIST and DVS Gesture datasets respectively. We formalize spiking neural networks as GLM predictors, identifying a suitable representation for their target domain.
arXiv Detail & Related papers (2020-10-15T17:45:18Z)
Neural Networks Enhancement with Logical Knowledge [83.9217787335878]
We propose an extension of KENN for relational data. The results show that KENN is capable of increasing the performances of the underlying neural network even in the presence relational data.
arXiv Detail & Related papers (2020-09-13T21:12:20Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.