Related papers: Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

URL: http://arxiv.org/abs/2108.11798v1
Date: Thu, 26 Aug 2021 14:01:04 GMT
Title: Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation
Authors: Maria Tzelepi and Anastasios Tefas
Abstract summary: Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space.
Score: 51.66271681532262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge Distillation has been established as a highly promising approach for training compact and faster models by transferring knowledge from heavyweight and powerful models. However, KD in its conventional version constitutes an enduring, computationally and memory demanding process. In this paper, Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space. This allows us for directly estimating the posterior class probabilities of the data samples, and we use them as soft labels that encode explicit information about the similarities of the data with the classes, negligibly affecting the computational cost. The experimental evaluation on four datasets validates the effectiveness of proposed method.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Generative Data Imputation for Sparse Learner Performance Data Using Generative Adversarial Imputation Networks [3.0800525961862992]
Missing responses due to skips or incomplete attempts create data sparsity. We propose a generative imputation approach using Generative Adrial Imputation Networks (GAIN) Our method features a three-dimensional (3D) framework (learners, questions, and attempts), flexibly accommodating various sparsity levels.
arXiv Detail & Related papers (2025-03-23T06:11:53Z)
KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner. We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases. Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z)
Condensed Sample-Guided Model Inversion for Knowledge Distillation [42.91823325342862]
Knowledge distillation (KD) is a key element in neural network compression that allows knowledge transfer from a pre-trained teacher model to a more compact student model. KD relies on access to the training dataset, which may not always be fully available due to privacy concerns or logistical issues related to the size of the data. In this paper, we consider condensed samples as a form of supplementary information, and introduce a method for using them to better approximate the target data distribution.
arXiv Detail & Related papers (2024-08-25T14:43:27Z)
CALICO: Confident Active Learning with Integrated Calibration [11.978551396144532]
We propose an AL framework that self-calibrates the confidence used for sample selection during the training process. We show improved classification performance compared to a softmax-based classifier with fewer labeled samples.
arXiv Detail & Related papers (2024-07-02T15:05:19Z)
Small Scale Data-Free Knowledge Distillation [37.708282211941416]
We propose Small Scale Data-free Knowledge Distillation SSD-KD. SSD-KD balances synthetic samples and a priority sampling function to select proper samples. It can perform distillation training conditioned on an extremely small scale of synthetic samples.
arXiv Detail & Related papers (2024-06-12T05:09:41Z)
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks. We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process. Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Post-training Model Quantization Using GANs for Synthetic Data Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z)
Uncertainty Estimation by Fisher Information-based Evidential Deep Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL) In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z)
Neural Capacity Estimators: How Reliable Are They? [14.904387585122851]
We study the performance of mutual information neural estimator (MINE), smoothed mutual information lower-bound estimator (SMILE), and information directed neural estimator (DINE) We evaluate these algorithms in terms of their ability to learn the input distributions that are capacity approaching for the AWGN channel, the optical intensity channel, and peak power-constrained AWGN channel.
arXiv Detail & Related papers (2021-11-14T18:14:53Z)
MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability. We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error. Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.