Vector-Quantized Soft Label Compression for Dataset Distillation
- URL: http://arxiv.org/abs/2603.03808v1
- Date: Wed, 04 Mar 2026 07:41:10 GMT
- Title: Vector-Quantized Soft Label Compression for Dataset Distillation
- Authors: Ali Abbasi, Ashkan Shahbazi, Hamed Pirsiavash, Soheil Kolouri,
- Abstract summary: We present a rigorous analysis of bit requirements across dataset distillation frameworks, quantifying the storage demands of both distilled samples and their soft labels.<n>To address the overhead, we introduce a vector-quantized autoencoder for compressing soft labels, achieving substantial compression while preserving the effectiveness of the distilled data.
- Score: 23.924270023738487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dataset distillation is an emerging technique for reducing the computational and storage costs of training machine learning models by synthesizing a small, informative subset of data that captures the essential characteristics of a much larger dataset. Recent methods pair synthetic samples and their augmentations with soft labels from a teacher model, enabling student models to generalize effectively despite the small size of the distilled dataset. While soft labels are critical for effective distillation, the storage and communication overhead they incur, especially when accounting for augmentations, is often overlooked. In practice, each distilled sample is associated with multiple soft labels, making them the dominant contributor to storage costs, particularly in large-class settings such as ImageNet-1K. In this paper, we present a rigorous analysis of bit requirements across dataset distillation frameworks, quantifying the storage demands of both distilled samples and their soft labels. To address the overhead, we introduce a vector-quantized autoencoder (VQAE) for compressing soft labels, achieving substantial compression while preserving the effectiveness of the distilled data. We validate our method on both vision and language distillation benchmarks. On ImageNet-1K, our proposed VQAE achieves 30--40x additional compression over RDED, LPLD, SRE2L, and CDA baselines while retaining over $90\%$ of their original performance.
Related papers
- Rectifying Soft-Label Entangled Bias in Long-Tailed Dataset Distillation [39.47633542394261]
We emphasize the critical role of soft labels in long-tailed dataset distillation.<n>We derive an imbalance-aware generalization bound for model trained on distilled dataset.<n>We then identify two primary sources of soft-label bias, which originate from the distillation model and the distilled images.<n>We propose ADSA, an Adaptive Soft-label Alignment module that calibrates the entangled biases.
arXiv Detail & Related papers (2025-11-22T04:37:27Z) - Dataset Distillation as Data Compression: A Rate-Utility Perspective [31.050187201929557]
We propose a joint rate-utility optimization method for dataset distillation.<n>We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks.<n>We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier.
arXiv Detail & Related papers (2025-07-23T05:40:52Z) - Rethinking Large-scale Dataset Compression: Shifting Focus From Labels to Images [60.42768987736088]
We introduce a benchmark that equitably evaluates methodologies across both distillation and pruning literatures.<n>Our benchmark reveals that in the mainstream dataset distillation setting for large-scale datasets, even randomly selected subsets can achieve surprisingly competitive performance.<n>We propose a new framework for dataset compression, termed Prune, Combine, and Augment (PCA), which focuses on leveraging image data exclusively.
arXiv Detail & Related papers (2025-02-10T13:11:40Z) - Heavy Labels Out! Dataset Distillation with Label Space Lightening [69.67681224137561]
HeLlO aims at effective image-to-label projectors, with which synthetic labels can be directly generated online from synthetic images.
We demonstrate that with only about 0.003% of the original storage required for a complete set of soft labels, we achieve comparable performance to current state-of-the-art dataset distillation methods on large-scale datasets.
arXiv Detail & Related papers (2024-08-15T15:08:58Z) - GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost [7.05277588099645]
Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models.<n>We introduce a novel perspective by emphasizing the full utilization of labels.<n>We introduce GIFT, which encompasses soft label refinement and a cosine similarity-based loss function.
arXiv Detail & Related papers (2024-05-23T16:02:30Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Data Distillation Can Be Like Vodka: Distilling More Times For Better
Quality [78.6359306550245]
We argue that using just one synthetic subset for distillation will not yield optimal generalization performance.
PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets.
Our experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%.
arXiv Detail & Related papers (2023-10-10T20:04:44Z) - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task.
By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset.
We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z) - Gradient-Free Structured Pruning with Unlabeled Data [57.999191898036706]
We propose a gradient-free structured pruning framework that uses only unlabeled data.
Up to 40% of the original FLOP count can be reduced with less than a 4% accuracy loss across all tasks considered.
arXiv Detail & Related papers (2023-03-07T19:12:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.