Generalizing Dataset Distillation via Deep Generative Prior
- URL: http://arxiv.org/abs/2305.01649v2
- Date: Wed, 3 May 2023 20:19:13 GMT
- Title: Generalizing Dataset Distillation via Deep Generative Prior
- Authors: George Cazenavette and Tongzhou Wang and Antonio Torralba and Alexei
A. Efros and Jun-Yan Zhu
- Abstract summary: We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
- Score: 75.9031209877651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset Distillation aims to distill an entire dataset's knowledge into a few
synthetic images. The idea is to synthesize a small number of synthetic data
points that, when given to a learning algorithm as training data, result in a
model approximating one trained on the original data. Despite recent progress
in the field, existing dataset distillation methods fail to generalize to new
architectures and scale to high-resolution datasets. To overcome the above
issues, we propose to use the learned prior from pre-trained deep generative
models to synthesize the distilled data. To achieve this, we present a new
optimization algorithm that distills a large number of images into a few
intermediate feature vectors in the generative model's latent space. Our method
augments existing techniques, significantly improving cross-architecture
generalization in all settings.
Related papers
- Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD)
This method systematically explores hierarchical layers within the generative adversarial networks (GANs)
In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Generative Dataset Distillation: Balancing Global Structure and Local Details [49.20086587208214]
We propose a new dataset distillation method that considers balancing global structure and local details.
Our method involves using a conditional generative adversarial network to generate the distilled dataset.
arXiv Detail & Related papers (2024-04-26T23:46:10Z) - Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy.
Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching [19.8751746334929]
We present an algorithm that remains effective as the size of the synthetic dataset grows.
We experimentally find that the training stage of the trajectories we choose to match greatly affects the effectiveness of the distilled dataset.
In doing so, we successfully scale trajectory matching-based methods to larger synthetic datasets.
arXiv Detail & Related papers (2023-10-09T14:57:41Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Distillation using Neural Feature Regression [32.53291298089172]
We develop an algorithm for dataset distillation using neural Feature Regression with Pooling (FRePo)
FRePo achieves state-of-the-art performance with an order of magnitude less memory requirement and two orders of magnitude faster training than previous methods.
We show that high-quality distilled data can greatly improve various downstream applications, such as continual learning and membership inference defense.
arXiv Detail & Related papers (2022-06-01T19:02:06Z) - Learning to Generate Synthetic Training Data using Gradient Matching and
Implicit Differentiation [77.34726150561087]
This article explores various data distillation techniques that can reduce the amount of data required to successfully train deep networks.
Inspired by recent ideas, we suggest new data distillation techniques based on generative teaching networks, gradient matching, and the Implicit Function Theorem.
arXiv Detail & Related papers (2022-03-16T11:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.