Multisize Dataset Condensation
- URL: http://arxiv.org/abs/2403.06075v2
- Date: Sun, 14 Apr 2024 09:28:28 GMT
- Title: Multisize Dataset Condensation
- Authors: Yang He, Lingao Xiao, Joey Tianyi Zhou, Ivor Tsang,
- Abstract summary: Multisize dataset condensation improves training efficiency in on-device scenarios.
In this paper, we propose Multisize dataset condensation (MDC) by compressing N condensation processes into a single condensation process.
Our method offers several benefits: 1) No additional condensation process is required; 2) reduced storage requirement by reusing condensed images.
- Score: 34.14939894093381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While dataset condensation effectively enhances training efficiency, its application in on-device scenarios brings unique challenges. 1) Due to the fluctuating computational resources of these devices, there's a demand for a flexible dataset size that diverges from a predefined size. 2) The limited computational power on devices often prevents additional condensation operations. These two challenges connect to the "subset degradation problem" in traditional dataset condensation: a subset from a larger condensed dataset is often unrepresentative compared to directly condensing the whole dataset to that smaller size. In this paper, we propose Multisize Dataset Condensation (MDC) by compressing N condensation processes into a single condensation process to obtain datasets with multiple sizes. Specifically, we introduce an "adaptive subset loss" on top of the basic condensation loss to mitigate the "subset degradation problem". Our MDC method offers several benefits: 1) No additional condensation process is required; 2) reduced storage requirement by reusing condensed images. Experiments validate our findings on networks including ConvNet, ResNet and DenseNet, and datasets including SVHN, CIFAR-10, CIFAR-100 and ImageNet. For example, we achieved 5.22%-6.40% average accuracy gains on condensing CIFAR-10 to ten images per class. Code is available at: https://github.com/he-y/Multisize-Dataset-Condensation.
Related papers
- Elucidating the Design Space of Dataset Condensation [23.545641118984115]
A concept within data-centric learning, dataset condensation efficiently transfers critical attributes from an original dataset to a synthetic version.
We propose a comprehensive design framework that includes specific, effective strategies like implementing soft category-aware matching.
In our testing, EDC achieves state-of-the-art accuracy, reaching 48.6% on ImageNet-1k with a ResNet-18 model at an IPC of 10, which corresponds to a compression ratio of 0.78%.
arXiv Detail & Related papers (2024-04-21T18:19:27Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - You Only Condense Once: Two Rules for Pruning Condensed Datasets [41.92794134275854]
You Only Condense Once (YOCO) produces smaller condensed datasets with two embarrassingly simple dataset pruning rules.
Experiments validate our findings on networks including ConvNet, ResNet and DenseNet.
arXiv Detail & Related papers (2023-10-21T14:05:58Z) - Dataset Condensation via Generative Model [71.89427409059472]
We propose to condense large datasets into another format, a generative model.
Such a novel format allows for the condensation of large datasets because the size of the generative model remains relatively stable as the number of classes or image resolution increases.
An intra-class and an inter-class loss are proposed to model the relation of condensed samples.
arXiv Detail & Related papers (2023-09-14T13:17:02Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching [27.313740022587442]
We propose CFNet, a Cascade and Fused cost volume based network to improve the robustness of the stereo matching network.
We employ a variance-based uncertainty estimation to adaptively adjust the next stage disparity search space.
Our proposed method achieves the state-of-the-art overall performance and obtains the 1st place on the stereo task of Robust Vision Challenge 2020.
arXiv Detail & Related papers (2021-04-09T11:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.