Related papers: Practical Knowledge Distillation: Using DNNs to Beat DNNs

Practical Knowledge Distillation: Using DNNs to Beat DNNs

URL: http://arxiv.org/abs/2302.12360v1
Date: Thu, 23 Feb 2023 22:53:02 GMT
Title: Practical Knowledge Distillation: Using DNNs to Beat DNNs
Authors: Chung-Wei Lee, Pavlos Anastasios Apostolopulos, Igor L. Markov
Abstract summary: We explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.
Score: 8.121769391666547
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide.

Related papers

DONOD: Robust and Generalizable Instruction Fine-Tuning for LLMs via Model-Intrinsic Dataset Pruning [22.704995231753397]
Ad-hoc instruction fine-tuning of large language models (LLMs) is widely adopted for domain-specific adaptation. We propose DONOD, a lightweight model-intrinsic data pruning method. By filtering out 70% of the full dataset, we improve target-domain accuracy by 14.90% and cross-domain accuracy by 5.67%.
arXiv Detail & Related papers (2025-04-21T02:25:03Z)
Beyond QA Pairs: Assessing Parameter-Efficient Fine-Tuning for Fact Embedding in LLMs [0.0]
This paper focuses on improving the fine-tuning process by categorizing question-answer pairs into Factual and Conceptual classes. Two distinct Llama-2 models are fine-tuned based on these classifications and evaluated using larger models like GPT-3.5 Turbo and Gemini. Our results indicate that models trained on conceptual datasets outperform those trained on factual datasets.
arXiv Detail & Related papers (2025-03-03T03:26:30Z)
Subsampling Graphs with GNN Performance Guarantees [34.32848091746629]
We introduce new subsampling methods for graph datasets. We prove that training a GNN on the subsampled data results in a bounded increase in loss compared to training on the full dataset.
arXiv Detail & Related papers (2025-02-23T20:21:16Z)
Novel Representation Learning Technique using Graphs for Performance Analytics [0.0]
We propose a novel idea of transforming performance data into graphs to leverage the advancement of Graph Neural Network-based (GNN) techniques. In contrast to other Machine Learning application domains, such as social networks, the graph is not given; instead, we need to build it. We evaluate the effectiveness of the generated embeddings from GNNs based on how well they make even a simple feed-forward neural network perform for regression tasks.
arXiv Detail & Related papers (2024-01-19T16:34:37Z)
Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework for Enhancing Model Performance and Efficiency [9.460023981858319]
We propose an end-to-end Adaptive DAtaset PRUNing framework called AdaPruner. AdaPruner iteratively prunes redundant samples to an expected pruning ratio. It can still significantly enhance model performance even after pruning up to 10-30% of the training data.
arXiv Detail & Related papers (2023-12-09T16:01:21Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models. We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models. Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z)
Post-training Model Quantization Using GANs for Synthetic Data Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method. We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z)
Accelerating Dataset Distillation via Model Augmentation [41.3027484667024]
We propose two model augmentation techniques, i.e. using early-stage models and parameter parameters to learn an informative synthetic set with significantly reduced training cost. Our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-12T07:36:05Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
Data-Free Adversarial Knowledge Distillation for Graph Neural Networks [62.71646916191515]
We propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN) To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model. Our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.
arXiv Detail & Related papers (2022-05-08T08:19:40Z)
Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks. Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z)
PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks [61.51515750218049]
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks. We propose Prompt-based Data Augmentation model (PromDA) which only trains small-scale Soft Prompt. PromDA generates synthetic data via two different views and filters out the low-quality data using NLU models.
arXiv Detail & Related papers (2022-02-25T05:09:27Z)
Self-Competitive Neural Networks [0.0]
Deep Neural Networks (DNNs) have improved the accuracy of classification problems in lots of applications. One of the challenges in training a DNN is its need to be fed by an enriched dataset to increase its accuracy and avoid it suffering from overfitting. Recently, researchers have worked extensively to propose methods for data augmentation. In this paper, we generate adversarial samples to refine the Domains of Attraction (DoAs) of each class. In this approach, at each stage, we use the model learned by the primary and generated adversarial data (up to that stage) to manipulate the primary data in a way that look complicated to
arXiv Detail & Related papers (2020-08-22T12:28:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.