Practical Knowledge Distillation: Using DNNs to Beat DNNs
- URL: http://arxiv.org/abs/2302.12360v1
- Date: Thu, 23 Feb 2023 22:53:02 GMT
- Title: Practical Knowledge Distillation: Using DNNs to Beat DNNs
- Authors: Chung-Wei Lee, Pavlos Anastasios Apostolopulos, Igor L. Markov
- Abstract summary: We explore data and model distillation, as well as data denoising.
These techniques improve both gradient-boosting models and a specialized DNN architecture.
For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling.
Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.
- Score: 8.121769391666547
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: For tabular data sets, we explore data and model distillation, as well as
data denoising. These techniques improve both gradient-boosting models and a
specialized DNN architecture. While gradient boosting is known to outperform
DNNs on tabular data, we close the gap for datasets with 100K+ rows and give
DNNs an advantage on small data sets. We extend these results with input-data
distillation and optimized ensembling to help DNN performance match or exceed
that of gradient boosting. As a theoretical justification of our practical
method, we prove its equivalence to classical cross-entropy knowledge
distillation. We also qualitatively explain the superiority of DNN ensembles
over XGBoost on small data sets. For an industry end-to-end real-time ML
platform with 4M production inferences per second, we develop a model-training
workflow based on data sampling that distills ensembles of models into a single
gradient-boosting model favored for high-performance real-time inference,
without performance loss. Empirical evaluation shows that the proposed
combination of methods consistently improves model accuracy over prior best
models across several production applications deployed worldwide.
Related papers
- Novel Representation Learning Technique using Graphs for Performance
Analytics [0.0]
We propose a novel idea of transforming performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques.
In contrast to other Machine Learning application domains, such as social networks, the graph is not given; instead, we need to build it.
We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks.
arXiv Detail & Related papers (2024-01-19T16:34:37Z) - Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework
for Enhancing Model Performance and Efficiency [9.460023981858319]
We propose an end-to-end Adaptive DAtaset PRUNing framework called AdaPruner.
AdaPruner iteratively prunes redundant samples to an expected pruning ratio.
It can still significantly enhance model performance even after pruning up to 10-30% of the training data.
arXiv Detail & Related papers (2023-12-09T16:01:21Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Post-training Model Quantization Using GANs for Synthetic Data
Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method.
We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z) - Accelerating Dataset Distillation via Model Augmentation [41.3027484667024]
We propose two model augmentation techniques, i.e. using early-stage models and parameter parameters to learn an informative synthetic set with significantly reduced training cost.
Our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-12T07:36:05Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN)
To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model.
Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z) - PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks [61.51515750218049]
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks.
We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt.
PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models.
arXiv Detail & Related papers (2022-02-25T05:09:27Z) - Self-Competitive Neural Networks [0.0]
Deep Neural Networks (DNNs) have improved the accuracy of classification problems in lots of applications.
One of the challenges in training a DNN is its need to be fed by an enriched dataset to increase its accuracy and avoid it suffering from overfitting.
Recently, researchers have worked extensively to propose methods for data augmentation.
In this paper, we generate adversarial samples to refine the Domains of Attraction (DoAs) of each class. In this approach, at each stage, we use the model learned by the primary and generated adversarial data (up to that stage) to manipulate the primary data in a way that look complicated to
arXiv Detail & Related papers (2020-08-22T12:28:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.