DiRe: Diversity-promoting Regularization for Dataset Condensation
- URL: http://arxiv.org/abs/2512.13083v1
- Date: Mon, 15 Dec 2025 08:33:44 GMT
- Title: DiRe: Diversity-promoting Regularization for Dataset Condensation
- Authors: Saumyaranjan Mohanty, Aravind Reddy, Konda Reddy Mopuri,
- Abstract summary: We propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance.<n>DiRe can be applied off-the-shelf to various state-of-the-art condensation methods.<n>We demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets.
- Score: 5.276232626689568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.
Related papers
- Efficient Dataset Distillation through Low-Rank Space Sampling [34.29086540681496]
This paper proposes a dataset distillation method based on Matching Training Trajectories with Low-rank Space Sampling.<n>The synthetic data is represented by basis vectors and shared dimension mappers from these subspaces.<n>The proposed method is tested on CIFAR-10, CIFAR-100, and SVHN datasets, and outperforms the baseline methods by an average of 9.9%.
arXiv Detail & Related papers (2025-03-11T02:59:17Z) - Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation [20.556083321381514]
Data-free knowledge distillation (DFKD) has emerged as a pivotal technique in the domain of model compression.
This paper introduces an innovative approach to DFKD through diverse diffusion augmentation (DDA)
Comprehensive experiments conducted on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets showcase the superior performance of our method.
arXiv Detail & Related papers (2024-10-23T07:01:16Z) - Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment [39.137060714048175]
We argue that enhancing diversity can improve the parallelizable yet isolated approach to synthesizing datasets.
We introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process.
Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset.
arXiv Detail & Related papers (2024-09-26T08:03:19Z) - Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation [57.6797306341115]
We take an initial step towards understanding various matching-based DD methods from the perspective of sample difficulty.<n>We then extend the neural scaling laws of data pruning to DD to theoretically explain these matching-based methods.<n>We introduce the Sample Difficulty Correction (SDC) approach, designed to predominantly generate easier samples to achieve higher dataset quality.
arXiv Detail & Related papers (2024-08-22T15:20:32Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.<n>In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.<n>This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching [5.2193774924981176]
Training advanced deep hashing models has become more expensive due to complex optimizations and large datasets.<n>We propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space.
arXiv Detail & Related papers (2023-05-29T13:23:55Z) - Dataset Condensation with Latent Space Knowledge Factorization and
Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset.
Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes.
We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - Dataset Condensation via Efficient Synthetic-Data Parameterization [40.56817483607132]
Machine learning with massive amounts of data comes at a price of huge computation costs and storage for training and tuning.
Recent studies on dataset condensation attempt to reduce the dependence on such massive data by synthesizing a compact training dataset.
We propose a novel condensation framework that generates multiple synthetic data with a limited storage budget via efficient parameterization considering data regularity.
arXiv Detail & Related papers (2022-05-30T09:55:31Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.