Dataset Distillation in Large Data Era
- URL: http://arxiv.org/abs/2311.18838v1
- Date: Thu, 30 Nov 2023 18:59:56 GMT
- Title: Dataset Distillation in Large Data Era
- Authors: Zeyuan Yin and Zhiqiang Shen
- Abstract summary: We show how to distill various large-scale datasets such as full ImageNet-1K/21K under a conventional input resolution of 224$times$224.
We show that the proposed model beats the current state-of-the-art by more than 4% Top-1 accuracy on ImageNet-1K/21K.
- Score: 31.758821805424393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dataset distillation aims to generate a smaller but representative subset
from a large dataset, which allows a model to be trained efficiently, meanwhile
evaluating on the original testing data distribution to achieve decent
performance. Many prior works have aimed to align with diverse aspects of the
original datasets, such as matching the training weight trajectories, gradient,
feature/BatchNorm distributions, etc. In this work, we show how to distill
various large-scale datasets such as full ImageNet-1K/21K under a conventional
input resolution of 224$\times$224 to achieve the best accuracy over all
previous approaches, including SRe$^2$L, TESLA and MTT. To achieve this, we
introduce a simple yet effective ${\bf C}$urriculum ${\bf D}$ata ${\bf
A}$ugmentation ($\texttt{CDA}$) during data synthesis that obtains the accuracy
on large-scale ImageNet-1K and 21K with 63.2% under IPC (Images Per Class) 50
and 36.1% under IPC 20, respectively. Finally, we show that, by integrating all
our enhancements together, the proposed model beats the current
state-of-the-art by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the
first time, reduces the gap to its full-data training counterpart to less than
absolute 15%. Moreover, this work represents the inaugural success in dataset
distillation on larger-scale ImageNet-21K under the standard 224$\times$224
resolution. Our code and distilled ImageNet-21K dataset of 20 IPC, 2K recovery
budget are available at https://github.com/VILA-Lab/SRe2L/tree/main/CDA.
Related papers
- Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching [74.75248610868685]
Teddy is a Taylor-approximated dataset distillation framework designed to handle large-scale dataset.
Teddy attains state-of-the-art efficiency and performance on the Tiny-ImageNet and original-sized ImageNet-1K dataset.
arXiv Detail & Related papers (2024-10-10T03:28:46Z) - Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently.
Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - DataDAM: Efficient Dataset Distillation with Attention Matching [15.300968899043498]
Researchers have long tried to minimize training costs in deep learning by maintaining strong generalization across diverse datasets.
Emerging research on dataset aims to reduce training costs by creating a small synthetic set that contains the information of a larger real dataset.
However, the synthetic data generated by previous methods are not guaranteed to distribute and discriminate as well as the original training data.
arXiv Detail & Related papers (2023-09-29T19:07:48Z) - Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale
From A New Perspective [27.650434284271363]
Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K datasets.
Our approach also surpasses MTT in terms of speed by approximately 52$times$ (ConvNet-4) and 16$times$ (ResNet-18) faster with less memory consumption of 11.6$times$ and 6.4$times$ during data synthesis.
arXiv Detail & Related papers (2023-06-22T17:59:58Z) - Large-scale Dataset Pruning with Dynamic Uncertainty [28.60845105174658]
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them.
In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop.
arXiv Detail & Related papers (2023-06-08T13:14:35Z) - Dataset Distillation with Convexified Implicit Gradients [69.16247946639233]
We show how implicit gradients can be effectively used to compute meta-gradient updates.
We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural kernel.
arXiv Detail & Related papers (2023-02-13T23:53:16Z) - Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory [66.035487142452]
We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K.
We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint.
The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
arXiv Detail & Related papers (2022-11-19T04:46:03Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z) - AutoSimulate: (Quickly) Learning Synthetic Data Generation [70.82315853981838]
We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
arXiv Detail & Related papers (2020-08-16T11:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.