Synthetic data generation method for data-free knowledge distillation in
regression neural networks
- URL: http://arxiv.org/abs/2301.04338v2
- Date: Wed, 10 May 2023 03:01:50 GMT
- Title: Synthetic data generation method for data-free knowledge distillation in
regression neural networks
- Authors: Tianxun Zhou, Keng-Hwee Chiam
- Abstract summary: Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student.
Previous work has proposed a data-free knowledge distillation method where synthetic data are generated using a generator model trained adversarially against the student model.
In this study, we investigate the behavior of various synthetic data generation methods and propose a new synthetic data generation strategy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge distillation is the technique of compressing a larger neural
network, known as the teacher, into a smaller neural network, known as the
student, while still trying to maintain the performance of the larger neural
network as much as possible. Existing methods of knowledge distillation are
mostly applicable for classification tasks. Many of them also require access to
the data used to train the teacher model. To address the problem of knowledge
distillation for regression tasks under the absence of original training data,
previous work has proposed a data-free knowledge distillation method where
synthetic data are generated using a generator model trained adversarially
against the student model. These synthetic data and their labels predicted by
the teacher model are then used to train the student model. In this study, we
investigate the behavior of various synthetic data generation methods and
propose a new synthetic data generation strategy that directly optimizes for a
large but bounded difference between the student and teacher model. Our results
on benchmark and case study experiments demonstrate that the proposed strategy
allows the student model to learn better and emulate the performance of the
teacher model more closely.
Related papers
- Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Continual Learning of Diffusion Models with Generative Distillation [34.52513912701778]
Diffusion models are powerful generative models that achieve state-of-the-art performance in image synthesis.
In this paper, we propose generative distillation, an approach that distils the entire reverse process of a diffusion model.
arXiv Detail & Related papers (2023-11-23T14:33:03Z) - Customizing Synthetic Data for Data-Free Student Learning [6.8080936803807734]
DFKD aims to obtain a lightweight student model without original training data.
To more effectively train the student model, synthetic data shall be customized to the current student learning ability.
We propose Customizing Synthetic Data for Data-Free Student Learning (CSD) in this paper.
arXiv Detail & Related papers (2023-07-10T13:17:29Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation [55.34995029082051]
We propose a method to learn to augment for data-scarce domain BERT knowledge distillation.
We show that the proposed method significantly outperforms state-of-the-art baselines on four different tasks.
arXiv Detail & Related papers (2021-01-20T13:07:39Z) - Generative Adversarial Simulator [2.3986080077861787]
We introduce a simulator-free approach to knowledge distillation in the context of reinforcement learning.
A key challenge is having the student learn the multiplicity of cases that correspond to a given action.
This is the first demonstration of simulator-free knowledge distillation between a teacher and a student policy.
arXiv Detail & Related papers (2020-11-23T15:31:12Z) - Neural Networks Are More Productive Teachers Than Human Raters: Active
Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model [57.41841346459995]
We study how to train a student deep neural network for visual recognition by distilling knowledge from a blackbox teacher model in a data-efficient manner.
We propose an approach that blends mixup and active learning.
arXiv Detail & Related papers (2020-03-31T05:44:55Z) - An Efficient Method of Training Small Models for Regression Problems
with Knowledge Distillation [1.433758865948252]
We propose a new formalism of knowledge distillation for regression problems.
First, we propose a new loss function, teacher outlier loss rejection, which rejects outliers in training samples using teacher model predictions.
By considering the multi-task network, training of the feature extraction of student models becomes more effective.
arXiv Detail & Related papers (2020-02-28T08:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.