Embarassingly Simple Dataset Distillation
- URL: http://arxiv.org/abs/2311.07025v1
- Date: Mon, 13 Nov 2023 02:14:54 GMT
- Title: Embarassingly Simple Dataset Distillation
- Authors: Feng Yunzhen, Vedantam Ramakrishna, Kempe Julia
- Abstract summary: We tackle dataset distillation at its core by treating it directly as a bilevel optimization problem.
A deeper dive into the nature of distilled data unveils pronounced intercorrelation.
We devise a boosting mechanism that generates distilled datasets that contain subsets with near optimal performance across different data budgets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation extracts a small set of synthetic training samples from
a large dataset with the goal of achieving competitive performance on test data
when trained on this sample. In this work, we tackle dataset distillation at
its core by treating it directly as a bilevel optimization problem.
Re-examining the foundational back-propagation through time method, we study
the pronounced variance in the gradients, computational burden, and long-term
dependencies. We introduce an improved method: Random Truncated Backpropagation
Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation
coupled with a random window, effectively stabilizing the gradients and
speeding up the optimization while covering long dependencies. This allows us
to establish new state-of-the-art for a variety of standard dataset benchmarks.
A deeper dive into the nature of distilled data unveils pronounced
intercorrelation. In particular, subsets of distilled datasets tend to exhibit
much worse performance than directly distilled smaller datasets of the same
size. Leveraging RaT-BPTT, we devise a boosting mechanism that generates
distilled datasets that contain subsets with near optimal performance across
different data budgets.
Related papers
- Distilling Long-tailed Datasets [13.330572317331198]
We propose a novel long-tailed dataset distillation method, Long-tailed dataset Aware distillation (LAD)
LAD reduces the distance between the student and the biased expert trajectories and prevents the tail class bias from being distilled to the synthetic dataset.
This work pioneers the field of long-tailed dataset distillation (LTDD), marking the first effective effort to distill long-tailed datasets.
arXiv Detail & Related papers (2024-08-24T15:36:36Z) - Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently.
Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z) - Sequential Subset Matching for Dataset Distillation [44.322842898670565]
We propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch)
Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances.
Our code is available at https://github.com/shqii1j/seqmatch.
arXiv Detail & Related papers (2023-11-02T19:49:11Z) - Dataset Distillation Meets Provable Subset Selection [14.158845925610438]
dataset distillation is proposed to compress a large training dataset into a smaller synthetic one that retains its performance.
We present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data.
We further merge the idea of data subset selection with dataset distillation, by training the distilled set on '' sampled points during the training procedure instead of randomly sampling the next batch.
arXiv Detail & Related papers (2023-07-16T15:58:19Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - LiDAR dataset distillation within bayesian active learning framework:
Understanding the effect of data augmentation [63.20765930558542]
Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size.
This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset.
We observe that data augmentation achieves full dataset accuracy using only 60% of samples from the selected dataset configuration.
arXiv Detail & Related papers (2022-02-06T00:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.