Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality
- URL: http://arxiv.org/abs/2310.06982v1
- Date: Tue, 10 Oct 2023 20:04:44 GMT
- Title: Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality
- Authors: Xuxi Chen, Yu Yang, Zhangyang Wang, Baharan Mirzasoleiman
- Abstract summary: We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
- Score: 78.6359306550245
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dataset distillation aims to minimize the time and memory needed for training
deep networks on large datasets, by creating a small set of synthetic images
that has a similar generalization performance to that of the full dataset.
However, current dataset distillation techniques fall short, showing a notable
performance gap when compared to training on the original data. In this work,
we are the first to argue that using just one synthetic subset for distillation
will not yield optimal generalization performance. This is because the training
dynamics of deep networks drastically change during the training. Hence,
multiple synthetic subsets are required to capture the training dynamics at
different phases of training. To address this issue, we propose Progressive
Dataset Distillation (PDD). PDD synthesizes multiple small sets of synthetic
images, each conditioned on the previous sets, and trains the model on the
cumulative union of these subsets without requiring additional training time.
Our extensive experiments show that PDD can effectively improve the performance
of existing dataset distillation methods by up to 4.3%. In addition, our method
for the first time enable generating considerably larger synthetic datasets.
Related papers
- Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching [19.8751746334929]
We present an algorithm that remains effective as the size of the synthetic dataset grows.
We experimentally find that the training stage of the trajectories we choose to match greatly affects the effectiveness of the distilled dataset.
In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets.
arXiv Detail & Related papers (2023-10-09T14:57:41Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Accelerating Dataset Distillation via Model Augmentation [41.3027484667024]
We propose two model augmentation techniques, i.e. using early-stage models and parameter parameters to learn an informative synthetic set with significantly reduced training cost.
Our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-12T07:36:05Z) - Dataset Distillation by Matching Training Trajectories [75.9031209877651]
We propose a new formulation that optimize our distilled data to guide networks to a similar state as those trained on real data.
Given a network, we train it for several iterations on our distilled data and optimize the distilled data with respect to the distance between the synthetically trained parameters and the parameters trained on real data.
Our method handily outperforms existing methods and also allows us to distill higher-resolution visual data.
arXiv Detail & Related papers (2022-03-22T17:58:59Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.