DiM: Distilling Dataset into Generative Model
- URL: http://arxiv.org/abs/2303.04707v2
- Date: Wed, 11 Oct 2023 11:30:32 GMT
- Title: DiM: Distilling Dataset into Generative Model
- Authors: Kai Wang, Jianyang Gu, Daquan Zhou, Zheng Zhu, Wei Jiang and Yang You
- Abstract summary: We propose a novel distillation scheme to textbfDistill information of large train sets textbfinto generative textbfModels, named DiM.
During the distillation phase, we minimize the differences in logits predicted by a models pool between real and generated images.
At the deployment stage, the generative model synthesizes various training samples from random noises on the fly.
- Score: 42.32433831074992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation reduces the network training cost by synthesizing small
and informative datasets from large-scale ones. Despite the success of the
recent dataset distillation algorithms, three drawbacks still limit their wider
application: i). the synthetic images perform poorly on large architectures;
ii). they need to be re-optimized when the distillation ratio changes; iii).
the limited diversity restricts the performance when the distillation ratio is
large. In this paper, we propose a novel distillation scheme to
\textbf{D}istill information of large train sets \textbf{i}nto generative
\textbf{M}odels, named DiM. Specifically, DiM learns to use a generative model
to store the information of the target dataset. During the distillation phase,
we minimize the differences in logits predicted by a models pool between real
and generated images. At the deployment stage, the generative model synthesizes
various training samples from random noises on the fly. Due to the simple yet
effective designs, the trained DiM can be directly applied to different
distillation ratios and large architectures without extra cost. We validate the
proposed DiM across 4 datasets and achieve state-of-the-art results on all of
them. To the best of our knowledge, we are the first to achieve higher accuracy
on complex architectures than simple ones, such as 75.1\% with ResNet-18 and
72.6\% with ConvNet-3 on ten images per class of CIFAR-10. Besides, DiM
outperforms previous methods with 10\% $\sim$ 22\% when images per class are 1
and 10 on the SVHN dataset.
Related papers
- Data-to-Model Distillation: Data-Efficient Learning Framework [14.44010988811002]
We propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model.
Our method effectively scales up to high-resolution 128x128 ImageNet-1K.
arXiv Detail & Related papers (2024-11-19T20:10:28Z) - One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models.
Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z) - Latent Dataset Distillation with Diffusion Models [9.398135472047132]
This paper proposes Latent dataset Distillation with Diffusion Models (LD3M)
Our novel diffusion process is tailored for this task and significantly improves the gradient flow for distillation.
Overall, LD3M consistently outperforms state-of-the-art methods by up to 4.8 p.p. and 4.2 p.p. for 1 and 10 images per class, respectively.
arXiv Detail & Related papers (2024-03-06T17:41:41Z) - Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently.
Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - MIMIC: Masked Image Modeling with Image Correspondences [29.8154890262928]
Current methods for building effective pretraining datasets rely on annotated 3D meshes, point clouds, and camera parameters from simulated environments.
We propose a pretraining dataset-curation approach that does not require any additional annotations.
Our method allows us to generate multi-view datasets from both real-world videos and simulated environments at scale.
arXiv Detail & Related papers (2023-06-27T00:40:12Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Delving Deeper into Data Scaling in Masked Image Modeling [145.36501330782357]
We conduct an empirical study on the scaling capability of masked image modeling (MIM) methods for visual recognition.
Specifically, we utilize the web-collected Coyo-700M dataset.
Our goal is to investigate how the performance changes on downstream tasks when scaling with different sizes of data and models.
arXiv Detail & Related papers (2023-05-24T15:33:46Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.