Related papers: Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws

Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws

URL: http://arxiv.org/abs/2512.05817v3
Date: Wed, 10 Dec 2025 12:03:38 GMT
Title: Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws
Authors: Zhengquan Luo, Zhiqiang Xu,
Abstract summary: It is unclear under what conditions distilled data can retain the effectiveness of full datasets when the training configuration changes.<n>We propose a unified theoretical framework, termed configuration--configuration-error analysis, which reformulates major DD approaches under a common generalization-error perspective.<n>Our analysis reveals that various matching methods are interchangeable surrogates, reducing the same generalization error, clarifying why they can all achieve dataset distillation.
Score: 6.172966466468818
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and computation. Despite rapid empirical progress, its theoretical foundations remain limited: existing methods (gradient, distribution, trajectory matching) are built on heterogeneous surrogate objectives and optimization assumptions, which makes it difficult to analyze their common principles or provide general guarantees. Moreover, it is still unclear under what conditions distilled data can retain the effectiveness of full datasets when the training configuration, such as optimizer, architecture, or augmentation, changes. To answer these questions, we propose a unified theoretical framework, termed configuration--dynamics--error analysis, which reformulates major DD approaches under a common generalization-error perspective and provides two main results: (i) a scaling law that provides a single-configuration upper bound, characterizing how the error decreases as the distilled sample size increases and explaining the commonly observed performance saturation effect; and (ii) a coverage law showing that the required distilled sample size scales linearly with configuration diversity, with provably matching upper and lower bounds. In addition, our unified analysis reveals that various matching methods are interchangeable surrogates, reducing the same generalization error, clarifying why they can all achieve dataset distillation and providing guidance on how surrogate choices affect sample efficiency and robustness. Experiments across diverse methods and configurations empirically confirm the derived laws, advancing a theoretical foundation for DD and enabling theory-driven design of compact, configuration-robust dataset distillation.

Related papers

From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness [6.073185086959359]
We propose a unified framework that jointly optimize dataset compactness and precision under fixed bit budgets.<n>QuADD integrates a differentiable quantization module within the distillation loop, enabling end-to-end co-optimization of synthetic samples and quantization parameters.<n>Our framework supports both uniform and adaptive non-uniform quantization, where the latter learns quantization levels from data to represent information-dense regions better.
arXiv Detail & Related papers (2026-03-02T21:46:10Z)
Nonparametric Data Attribution for Diffusion Models [57.820618036556084]
Data attribution for generative models seeks to quantify the influence of individual training examples on model outputs.<n>We propose a nonparametric attribution method that operates entirely on data, measuring influence via patch-level similarity between generated and training images.
arXiv Detail & Related papers (2025-10-16T03:37:16Z)
Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis [57.614436689939986]
Diffusion Bridge and Flow Matching have both demonstrated compelling empirical performance in transformation between arbitrary distributions.<n>We recast their frameworks through the lens of Optimal Control and prove that the cost function of the Diffusion Bridge is lower.<n>To corroborate these theoretical claims, we propose a novel, powerful architecture for Diffusion Bridge built on a latent Transformer.
arXiv Detail & Related papers (2025-09-29T09:45:22Z)
Rectified Decoupled Dataset Distillation: A Closer Look for Fair and Comprehensive Evaluation [36.444254126901065]
We propose Rectified Decoupled dataset Distillation (RD$3$) to generate compact synthetic datasets.<n>RD$3$ provides a foundation for fair and reproducible comparisons in future dataset distillation research.
arXiv Detail & Related papers (2025-09-24T03:47:04Z)
Partial Transportability for Domain Generalization [56.37032680901525]
Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution.<n>Our contribution is to provide the first general estimation technique for transportability problems.<n>We propose a gradient-based optimization scheme for making scalable inferences in practice.
arXiv Detail & Related papers (2025-03-30T22:06:37Z)
Dataset Distillation as Pushforward Optimal Quantization [2.5892916589735457]
We propose a synthetic training set that achieves similar performance to training on real data, with orders of magnitude less computational requirements.<n>In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems.<n>We achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings.
arXiv Detail & Related papers (2025-01-13T20:41:52Z)
Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.<n>We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.<n>We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z)
Physics-Informed Diffusion Models [0.0]
We present a framework that unifies generative modeling and partial differential equation fulfillment.<n>Our approach reduces the residual error by up to two orders of magnitude compared to previous work in a fluid flow case study.
arXiv Detail & Related papers (2024-03-21T13:52:55Z)
Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.