Related papers: Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

URL: http://arxiv.org/abs/2211.10586v4
Date: Tue, 31 Oct 2023 19:28:40 GMT
Title: Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
Authors: Justin Cui, Ruochen Wang, Si Si, Cho-Jui Hsieh
Abstract summary: We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K. We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint. The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
Score: 66.035487142452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset Distillation is a newly emerging area that aims to distill large datasets into much smaller and highly informative synthetic ones to accelerate training and reduce storage. Among various dataset distillation methods, trajectory-matching-based methods (MTT) have achieved SOTA performance in many tasks, e.g., on CIFAR-10/100. However, due to exorbitant memory consumption when unrolling optimization through SGD steps, MTT fails to scale to large-scale datasets such as ImageNet-1K. Can we scale this SOTA method to ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To answer these questions, we first propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with ~6x reduction in memory footprint. We further discover that it is challenging for MTT to handle datasets with a large number of classes, and propose a novel soft label assignment that drastically improves its convergence. The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU (all previous methods can only scale to 2 IPCs on ImageNet-1K), leading to the best accuracy (only 5.9% accuracy drop against full dataset training) while utilizing only 4.2% of the number of data points - an 18.2% absolute gain over prior SOTA. Our code is available at https://github.com/justincui03/tesla

Related papers

How far can we go with ImageNet for Text-to-Image generation? [0.0]
Recent text-to-image (T2I) generation models have achieved remarkable results by training on billion-scale datasets. We challenge this established paradigm by demonstrating that strategic data augmentation of small, well-curated datasets can match or outperform models trained on massive web-scraped collections.
arXiv Detail & Related papers (2025-02-28T18:59:42Z)
Dataset Distillation via Curriculum Data Synthesis in Large Data Era [26.883100340763317]
We introduce a simple yet effective global-to-local gradient refinement approach enabled by curriculum data augmentation during data synthesis. The proposed model outperforms the current state-of-the-art methods like SRe$2$L, TESLA, and MTT by more than 4% Top-1 accuracy on ImageNet-1K/21K and for the first time, reduces the gap to its full-data training counterparts to less than absolute 15%.
arXiv Detail & Related papers (2023-11-30T18:59:56Z)
Improving Resnet-9 Generalization Trained on Small Datasets [4.977981835063451]
The challenge is to achieve the highest possible accuracy in an image classification task in less than 10 minutes. The training is done on a small dataset of 5000 images picked randomly from CIFAR-10 dataset. Our experiments show that the ResNet-9 can achieve the accuracy of 88% while trained only on a 10% subset of CIFAR-10 dataset.
arXiv Detail & Related papers (2023-09-07T18:46:52Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective [27.650434284271363]
Under 50 IPC, our approach achieves the highest 42.5% and 60.8% validation accuracy on Tiny-ImageNet and ImageNet-1K datasets. Our approach also surpasses MTT in terms of speed by approximately 52$times$ (ConvNet-4) and 16$times$ (ResNet-18) faster with less memory consumption of 11.6$times$ and 6.4$times$ during data synthesis.
arXiv Detail & Related papers (2023-06-22T17:59:58Z)
Large-scale Dataset Pruning with Dynamic Uncertainty [28.60845105174658]
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop.
arXiv Detail & Related papers (2023-06-08T13:14:35Z)
SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage [52.317406324182215]
We propose a storage-efficient training strategy for vision classifiers for large-scale datasets. Our token storage only needs 1% of the original JPEG-compressed raw pixels. Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap.
arXiv Detail & Related papers (2023-03-20T13:55:35Z)
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement [68.44100784364987]
We propose a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks.
arXiv Detail & Related papers (2023-03-15T23:10:17Z)
Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks. We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z)
Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.