Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
- URL: http://arxiv.org/abs/2211.10586v4
- Date: Tue, 31 Oct 2023 19:28:40 GMT
- Title: Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
- Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh
- Abstract summary: We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K.
We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint.
The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
- Score: 66.035487142452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset Distillation is a newly emerging area that aims to distill large
datasets into much smaller and highly informative synthetic ones to accelerate
training and reduce storage. Among various dataset distillation methods,
trajectory-matching-based methods (MTT) have achieved SOTA performance in many
tasks, e.g., on CIFAR-10/100. However, due to exorbitant memory consumption
when unrolling optimization through SGD steps, MTT fails to scale to
large-scale datasets such as ImageNet-1K. Can we scale this SOTA method to
ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To
answer these questions, we first propose a procedure to exactly compute the
unrolled gradient with constant memory complexity, which allows us to scale MTT
to ImageNet-1K seamlessly with ~6x reduction in memory footprint. We further
discover that it is challenging for MTT to handle datasets with a large number
of classes, and propose a novel soft label assignment that drastically improves
its convergence. The resulting algorithm sets new SOTA on ImageNet-1K: we can
scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU (all
previous methods can only scale to 2 IPCs on ImageNet-1K), leading to the best
accuracy (only 5.9% accuracy drop against full dataset training) while
utilizing only 4.2% of the number of data points - an 18.2% absolute gain over
prior SOTA. Our code is available at https://github.com/justincui03/tesla
Related papers
- Dataset Distillation in Large Data Era [31.758821805424393]
We show how to distill various large-scale datasets such as full ImageNet-1K/21K under a conventional input resolution of 224$times$224.
We show that the proposed model beats the current state-of-the-art by more than 4% Top-1 accuracy on ImageNet-1K/21K.
arXiv Detail & Related papers (2023-11-30T18:59:56Z) - Improving Resnet-9 Generalization Trained on Small Datasets [4.977981835063451]
The challenge is to achieve the highest possible accuracy in an image classification task in less than 10 minutes.
The training is done on a small dataset of 5000 images picked randomly from CIFAR-10 dataset.
Our experiments show that the ResNet-9 can achieve the accuracy of 88% while trained only on a 10% subset of CIFAR-10 dataset.
arXiv Detail & Related papers (2023-09-07T18:46:52Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale
From A New Perspective [27.650434284271363]
Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K datasets.
Our approach also surpasses MTT in terms of speed by approximately 52$times$ (ConvNet-4) and 16$times$ (ResNet-18) faster with less memory consumption of 11.6$times$ and 6.4$times$ during data synthesis.
arXiv Detail & Related papers (2023-06-22T17:59:58Z) - Large-scale Dataset Pruning with Dynamic Uncertainty [28.60845105174658]
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them.
In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop.
arXiv Detail & Related papers (2023-06-08T13:14:35Z) - SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel
Storage [52.317406324182215]
We propose a storage-efficient training strategy for vision classifiers for large-scale datasets.
Our token storage only needs 1% of the original JPEG-compressed raw pixels.
Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap.
arXiv Detail & Related papers (2023-03-20T13:55:35Z) - Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness
with Dataset Reinforcement [68.44100784364987]
We propose a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users.
We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+.
Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks.
arXiv Detail & Related papers (2023-03-15T23:10:17Z) - Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks.
We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts.
We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z) - Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task.
This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken.
We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.