Related papers: Generative Dataset Distillation Based on Self-knowledge Distillation

Generative Dataset Distillation Based on Self-knowledge Distillation

URL: http://arxiv.org/abs/2501.04202v1
Date: Wed, 08 Jan 2025 00:43:31 GMT
Title: Generative Dataset Distillation Based on Self-knowledge Distillation
Authors: Longzhen Li, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama,
Abstract summary: We present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits.<n>Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data.<n>Our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.
Score: 49.20086587208214
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation is an effective technique for reducing the cost and complexity of model training while maintaining performance by compressing large datasets into smaller, more efficient versions. In this paper, we present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits. Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data, thereby capturing the overall structure and relationships within the data. To further improve the accuracy of alignment, we introduce a standardization step on the logits before performing distribution matching, ensuring consistency in the range of logits. Through extensive experiments, we demonstrate that our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.

Related papers

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory [33.38900857290244]
We present a diversity-driven generative dataset distillation method based on a diffusion model to solve this problem.<n>We introduce self-adaptive memory to align the distribution between distilled and real datasets, assessing the representativeness.<n>Our method outperforms existing state-of-the-art methods in most situations.
arXiv Detail & Related papers (2025-05-26T03:48:56Z)
DD-Ranking: Rethinking the Evaluation of Dataset Distillation [223.28392857127733]
We propose DD-Ranking, a unified evaluation framework, along with new general evaluation metrics to uncover the true performance improvements achieved by different methods.<n>By refocusing on the actual information enhancement of distilled datasets, DD-Ranking provides a more comprehensive and fair evaluation standard for future research advancements.
arXiv Detail & Related papers (2025-05-19T16:19:50Z)
Robust Dataset Distillation by Matching Adversarial Trajectories [21.52323435014135]
We introduce the task of robust dataset distillation", a novel paradigm that embeds adversarial robustness into synthetic datasets during the distillation process. We propose Matching Adversarial Trajectories (MAT), a method that integrates adversarial training into trajectory-based dataset distillation. MAT incorporates adversarial samples during trajectory generation to obtain robust training trajectories, which are then used to guide the distillation process.
arXiv Detail & Related papers (2025-03-15T10:02:38Z)
Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data. DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z)
Dataset Distillation via Committee Voting [21.018818924580877]
We introduce $bf C$ommittee $bf V$oting for $bf D$ataset $bf D$istillation (CV-DD) CV-DD is a novel approach that leverages the collective wisdom of multiple models or experts to create high-quality distilled datasets.
arXiv Detail & Related papers (2025-01-13T18:59:48Z)
Exploring the potential of prototype-based soft-labels data distillation for imbalanced data classification [0.0]
Main goal is to push further the performance of prototype-based soft-labels distillation in terms of classification accuracy. Experimental studies trace the capability of the method to distill the data, but also the opportunity to act as an augmentation method.
arXiv Detail & Related papers (2024-03-25T19:15:19Z)
Towards Adversarially Robust Dataset Distillation by Curvature Regularization [11.02948004359488]
We study how to embed adversarial robustness in distilled datasets, so that models trained on these datasets maintain the high accuracy and acquire better adversarial robustness.<n>We propose a new method that achieves this goal by incorporating curvature regularization into the distillation process with much less computational overhead than standard adversarial training.
arXiv Detail & Related papers (2024-03-15T06:31:03Z)
Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets. dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset. We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z)
Dataset Distillation via the Wasserstein Metric [35.32856617593164]
We introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in dataset distillation. Our method achieves new state-of-the-art performance across a range of high-resolution datasets.
arXiv Detail & Related papers (2023-11-30T13:15:28Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation [96.92250565207017]
We study the data efficiency and selection for the dataset distillation task. By re-formulating the dynamics of distillation, we provide insight into the inherent redundancy in the real dataset. We find the most contributing samples based on their causal effects on the distillation.
arXiv Detail & Related papers (2023-05-28T06:53:41Z)
Explicit and Implicit Knowledge Distillation via Unlabeled Data [5.702176304876537]
We propose an efficient unlabeled sample selection method to replace high computational generators. We also propose a class-dropping mechanism to suppress the label noise caused by the data domain shifts. Experimental results show that our method can quickly converge and obtain higher accuracy than other state-of-the-art methods.
arXiv Detail & Related papers (2023-02-17T09:10:41Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.