Practical Dataset Distillation Based on Deep Support Vectors
- URL: http://arxiv.org/abs/2405.00348v1
- Date: Wed, 1 May 2024 06:41:27 GMT
- Title: Practical Dataset Distillation Based on Deep Support Vectors
- Authors: Hyunho Lee, Junhoo Lee, Nojun Kwak,
- Abstract summary: In this paper, we focus on dataset distillation in practical scenarios with access to only a fraction of the entire dataset.
We introduce a novel distillation method that augments the conventional process by incorporating general model knowledge via the addition of Deep KKT (DKKT) loss.
In practical settings, our approach showed improved performance compared to the baseline distribution matching distillation method on the CIFAR-10 dataset.
- Score: 27.16222034423108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional dataset distillation requires significant computational resources and assumes access to the entire dataset, an assumption impractical as it presumes all data resides on a central server. In this paper, we focus on dataset distillation in practical scenarios with access to only a fraction of the entire dataset. We introduce a novel distillation method that augments the conventional process by incorporating general model knowledge via the addition of Deep KKT (DKKT) loss. In practical settings, our approach showed improved performance compared to the baseline distribution matching distillation method on the CIFAR-10 dataset. Additionally, we present experimental evidence that Deep Support Vectors (DSVs) offer unique information to the original distillation, and their integration results in enhanced performance.
Related papers
- Mitigating Bias in Dataset Distillation [62.79454960378792]
We study the impact of bias inside the original dataset on the performance of dataset distillation.
We introduce a simple yet highly effective approach based on a sample reweighting scheme utilizing kernel density estimation.
arXiv Detail & Related papers (2024-06-06T18:52:28Z) - Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy.
Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z) - DD-RobustBench: An Adversarial Robustness Benchmark for Dataset Distillation [25.754877176280708]
We introduce a comprehensive benchmark that is the most extensive to date for evaluating the adversarial robustness of distilled datasets in a unified way.
Our benchmark significantly expands upon prior efforts by incorporating the latest advancements such as TESLA and SRe2L.
We also discovered that incorporating distilled data into the training batches of the original dataset can yield to improvement of robustness.
arXiv Detail & Related papers (2024-03-20T06:00:53Z) - Towards Adversarially Robust Dataset Distillation by Curvature Regularization [11.463315774971857]
We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and acquire better adversarial robustness.
We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training.
arXiv Detail & Related papers (2024-03-15T06:31:03Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Evaluating the effect of data augmentation and BALD heuristics on
distillation of Semantic-KITTI dataset [63.20765930558542]
Active Learning has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets.
We evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection.
We also study the effect of application of data augmentation within Bayesian AL based dataset distillation.
arXiv Detail & Related papers (2023-02-21T13:56:47Z) - A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power.
Deep learning technology has developed unprecedentedly in the last decade.
This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.