You Only Condense Once: Two Rules for Pruning Condensed Datasets
- URL: http://arxiv.org/abs/2310.14019v1
- Date: Sat, 21 Oct 2023 14:05:58 GMT
- Title: You Only Condense Once: Two Rules for Pruning Condensed Datasets
- Authors: Yang He, Lingao Xiao, Joey Tianyi Zhou
- Abstract summary: You Only Condense Once (YOCO) produces smaller condensed datasets with two embarrassingly simple dataset pruning rules.
Experiments validate our findings on networks including ConvNet, ResNet and DenseNet.
- Score: 41.92794134275854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset condensation is a crucial tool for enhancing training efficiency by
reducing the size of the training dataset, particularly in on-device scenarios.
However, these scenarios have two significant challenges: 1) the varying
computational resources available on the devices require a dataset size
different from the pre-defined condensed dataset, and 2) the limited
computational resources often preclude the possibility of conducting additional
condensation processes. We introduce You Only Condense Once (YOCO) to overcome
these limitations. On top of one condensed dataset, YOCO produces smaller
condensed datasets with two embarrassingly simple dataset pruning rules: Low
LBPE Score and Balanced Construction. YOCO offers two key advantages: 1) it can
flexibly resize the dataset to fit varying computational constraints, and 2) it
eliminates the need for extra condensation processes, which can be
computationally prohibitive. Experiments validate our findings on networks
including ConvNet, ResNet and DenseNet, and datasets including CIFAR-10,
CIFAR-100 and ImageNet. For example, our YOCO surpassed various dataset
condensation and dataset pruning methods on CIFAR-10 with ten Images Per Class
(IPC), achieving 6.98-8.89% and 6.31-23.92% accuracy gains, respectively. The
code is available at: https://github.com/he-y/you-only-condense-once.
Related papers
- Elucidating the Design Space of Dataset Condensation [23.545641118984115]
A concept within data-centric learning, dataset condensation efficiently transfers critical attributes from an original dataset to a synthetic version.
We propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching.
In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%.
arXiv Detail & Related papers (2024-04-21T18:19:27Z) - Multisize Dataset Condensation [34.14939894093381]
Multisize dataset condensation improves training efficiency in on-device scenarios.
In this paper, we propose Multisize dataset condensation (MDC) by compressing N condensation processes into a single condensation process.
Our method offers several benefits: 1) No additional condensation process is required; 2) reduced storage requirement by reusing condensed images.
arXiv Detail & Related papers (2024-03-10T03:43:02Z) - Dataset Condensation for Recommendation [29.239833773646975]
We propose a lightweight condensation framework tailored for recommendation (DConRec)
We model the discrete user-item interactions via a probabilistic approach and design a pre-augmentation module to incorporate the potential preferences of users into the condensed datasets.
Experimental results on multiple real-world datasets have demonstrated the effectiveness and efficiency of our framework.
arXiv Detail & Related papers (2023-10-02T09:30:11Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory [66.035487142452]
We show that trajectory-matching-based methods (MTT) can scale to large-scale datasets such as ImageNet-1K.
We propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with 6x reduction in memory footprint.
The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU.
arXiv Detail & Related papers (2022-11-19T04:46:03Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - Condensing Graphs via One-Step Gradient Matching [50.07587238142548]
We propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights.
Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs.
In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance.
arXiv Detail & Related papers (2022-06-15T18:20:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.