NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
- URL: http://arxiv.org/abs/2310.19820v1
- Date: Tue, 24 Oct 2023 04:27:51 GMT
- Title: NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
- Authors: Shunyao Zhang, Yonggan Fu, Shang Wu, Jyotikrishna Dass, Haoran You,
Yingyan (Celine) Lin
- Abstract summary: We propose a framework called NetDistiller to boost the achievable accuracy of TNNs.
The framework treats them as sub-networks of a weight-sharing teacher constructed by expanding the number of channels of the TNN.
Our code is available at https://github.com/GATECH-EIC/NetDistiller.
- Score: 19.93322471957759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Boosting the task accuracy of tiny neural networks (TNNs) has become a
fundamental challenge for enabling the deployments of TNNs on edge devices
which are constrained by strict limitations in terms of memory, computation,
bandwidth, and power supply. To this end, we propose a framework called
NetDistiller to boost the achievable accuracy of TNNs by treating them as
sub-networks of a weight-sharing teacher constructed by expanding the number of
channels of the TNN. Specifically, the target TNN model is jointly trained with
the weight-sharing teacher model via (1) gradient surgery to tackle the
gradient conflicts between them and (2) uncertainty-aware distillation to
mitigate the overfitting of the teacher model. Extensive experiments across
diverse tasks validate NetDistiller's effectiveness in boosting TNNs'
achievable accuracy over state-of-the-art methods. Our code is available at
https://github.com/GATECH-EIC/NetDistiller.
Related papers
- Joint A-SNN: Joint Training of Artificial and Spiking Neural Networks
via Self-Distillation and Weight Factorization [12.1610509770913]
Spiking Neural Networks (SNNs) mimic the spiking nature of brain neurons.
We propose a joint training framework of ANN and SNN, in which the ANN can guide the SNN's optimization.
Our method consistently outperforms many other state-of-the-art training methods.
arXiv Detail & Related papers (2023-05-03T13:12:17Z) - SPIDE: A Purely Spike-based Method for Training Feedback Spiking Neural
Networks [56.35403810762512]
Spiking neural networks (SNNs) with event-based computation are promising brain-inspired models for energy-efficient applications on neuromorphic hardware.
We study spike-based implicit differentiation on the equilibrium state (SPIDE) that extends the recently proposed training method.
arXiv Detail & Related papers (2023-02-01T04:22:59Z) - Quantum-Inspired Tensor Neural Networks for Option Pricing [4.3942901219301564]
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions.
A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs.
This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to control for industrial applications.
arXiv Detail & Related papers (2022-12-28T19:39:55Z) - Training Spiking Neural Networks with Local Tandem Learning [96.32026780517097]
Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient than their predecessors.
In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL)
We demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity.
arXiv Detail & Related papers (2022-10-10T10:05:00Z) - Strengthening the Training of Convolutional Neural Networks By Using
Walsh Matrix [0.0]
We have modified the training and structure of DNN to increase the classification performance.
A minimum distance network (MDN) following the last layer of the convolutional neural network (CNN) is used as the classifier.
In different areas, it has been observed that a higher classification performance was obtained by using the DivFE with less number of nodes.
arXiv Detail & Related papers (2021-03-31T18:06:11Z) - S2-BNN: Bridging the Gap Between Self-Supervised Real and 1-bit Neural
Networks via Guided Distribution Calibration [74.5509794733707]
We present a novel guided learning paradigm from real-valued to distill binary networks on the final prediction distribution.
Our proposed method can boost the simple contrastive learning baseline by an absolute gain of 5.515% on BNNs.
Our method achieves substantial improvement over the simple contrastive learning baseline, and is even comparable to many mainstream supervised BNN methods.
arXiv Detail & Related papers (2021-02-17T18:59:28Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - On Self-Distilling Graph Neural Network [64.00508355508106]
We propose the first teacher-free knowledge distillation method for GNNs, termed GNN Self-Distillation (GNN-SD)
The method is built upon the proposed neighborhood discrepancy rate (NDR), which quantifies the non-smoothness of the embedded graph in an efficient way.
We also summarize a generic GNN-SD framework that could be exploited to induce other distillation strategies.
arXiv Detail & Related papers (2020-11-04T12:29:33Z) - Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network.
We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.