Dataset Distillation as Data Compression: A Rate-Utility Perspective
- URL: http://arxiv.org/abs/2507.17221v1
- Date: Wed, 23 Jul 2025 05:40:52 GMT
- Title: Dataset Distillation as Data Compression: A Rate-Utility Perspective
- Authors: Youneng Bao, Yiping Liu, Zhuo Chen, Yongsheng Liang, Mu Li, Kede Ma,
- Abstract summary: We propose a joint rate-utility optimization method for dataset distillation.<n>We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks.<n>We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier.
- Score: 31.050187201929557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driven by the ``scale-is-everything'' paradigm, modern machine learning increasingly demands ever-larger datasets and models, yielding prohibitive computational and storage requirements. Dataset distillation mitigates this by compressing an original dataset into a small set of synthetic samples, while preserving its full utility. Yet, existing methods either maximize performance under fixed storage budgets or pursue suitable synthetic data representations for redundancy removal, without jointly optimizing both objectives. In this work, we propose a joint rate-utility optimization method for dataset distillation. We parameterize synthetic samples as optimizable latent codes decoded by extremely lightweight networks. We estimate the Shannon entropy of quantized latents as the rate measure and plug any existing distillation loss as the utility measure, trading them off via a Lagrange multiplier. To enable fair, cross-method comparisons, we introduce bits per class (bpc), a precise storage metric that accounts for sample, label, and decoder parameter costs. On CIFAR-10, CIFAR-100, and ImageNet-128, our method achieves up to $170\times$ greater compression than standard distillation at comparable accuracy. Across diverse bpc budgets, distillation losses, and backbone architectures, our approach consistently establishes better rate-utility trade-offs.
Related papers
- Vector-Quantized Soft Label Compression for Dataset Distillation [23.924270023738487]
We present a rigorous analysis of bit requirements across dataset distillation frameworks, quantifying the storage demands of both distilled samples and their soft labels.<n>To address the overhead, we introduce a vector-quantized autoencoder for compressing soft labels, achieving substantial compression while preserving the effectiveness of the distilled data.
arXiv Detail & Related papers (2026-03-04T07:41:10Z) - From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness [6.073185086959359]
We propose a unified framework that jointly optimize dataset compactness and precision under fixed bit budgets.<n>QuADD integrates a differentiable quantization module within the distillation loop, enabling end-to-end co-optimization of synthetic samples and quantization parameters.<n>Our framework supports both uniform and adaptive non-uniform quantization, where the latter learns quantization levels from data to represent information-dense regions better.
arXiv Detail & Related papers (2026-03-02T21:46:10Z) - Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation [36.444254126901065]
We propose Rectified Decoupled dataset Distillation (RD$3$) to generate compact synthetic datasets.<n>RD$3$ provides a foundation for fair and reproducible comparisons in future dataset distillation research.
arXiv Detail & Related papers (2025-09-24T03:47:04Z) - Efficient Token Compression for Vision Transformer with Spatial Information Preserved [59.79302182800274]
Token compression is essential for reducing the computational and memory requirements of transformer models.<n>We propose an efficient and hardware-compatible token compression method called Prune and Merge.
arXiv Detail & Related papers (2025-03-30T14:23:18Z) - Dataset Distillation with Neural Characteristic Function: A Minmax Perspective [39.77640775591437]
We reformulate dataset distillation as a minmax optimization problem and introduce Neural Characteristic Function Discrepancy (NCFD)<n>NCFD is a comprehensive and theoretically grounded metric for measuring distributional differences.<n>Our method achieves significant performance gains over state-of-the-art methods on both low- and high-resolution datasets.
arXiv Detail & Related papers (2025-02-28T02:14:55Z) - Dataset Distillation as Pushforward Optimal Quantization [1.039189397779466]
We propose a simple extension of the state-of-the-art data distillation method D4M, achieving better performance on the ImageNet-1K dataset with trivial additional computation.<n>We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem.<n>In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors.
arXiv Detail & Related papers (2025-01-13T20:41:52Z) - Generative Dataset Distillation Based on Self-knowledge Distillation [49.20086587208214]
We present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits.<n>Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data.<n>Our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.
arXiv Detail & Related papers (2025-01-08T00:43:31Z) - Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation [44.03611131165989]
We propose a novel generative parameterization method dubbed Hierarchical generative Distillation (H-PD)<n>The proposed H-PD achieves a significant performance improvement under various settings with equivalent time consumption.<n>It even surpasses current generative distillation using diffusion models under extreme compression ratios IPC=1 and IPC=10.
arXiv Detail & Related papers (2024-06-09T09:15:54Z) - Dataset Distillation via Adversarial Prediction Matching [24.487950991247764]
We propose an adversarial framework to solve the dataset distillation problem efficiently.
Our method can produce synthetic datasets just 10% the size of the original, yet achieve, on average, 94% of the test accuracy of models trained on the full original datasets.
arXiv Detail & Related papers (2023-12-14T13:19:33Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - Dataset Condensation via Efficient Synthetic-Data Parameterization [40.56817483607132]
Machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning.
Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset.
We propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity.
arXiv Detail & Related papers (2022-05-30T09:55:31Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Optimizing Vessel Trajectory Compression [71.42030830910227]
In previous work we introduced a trajectory detection module that can provide summarized representations of vessel trajectories by consuming AIS positional messages online.
This methodology can provide reliable trajectory synopses with little deviations from the original course by discarding at least 70% of the raw data as redundant.
However, such trajectory compression is very sensitive to parametrization.
We take into account the type of each vessel in order to provide a suitable configuration that can yield improved trajectory synopses.
arXiv Detail & Related papers (2020-05-11T20:38:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.