Related papers: Efficient Dataset Distillation via Minimax Diffusion

Efficient Dataset Distillation via Minimax Diffusion

URL: http://arxiv.org/abs/2311.15529v2
Date: Mon, 25 Mar 2024 14:52:44 GMT
Title: Efficient Dataset Distillation via Minimax Diffusion
Authors: Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen,
Abstract summary: We present a theoretical model of the process as hierarchical diffusion control demonstrating the flexibility of the diffusion process to target these criteria. Under the 100-IPC setting on ImageWoof, our method requires less than one-twentieth the distillation time of previous methods, yet yields even better performance.
Score: 24.805804922949832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dataset distillation reduces the storage and computational consumption of training a network by generating a small surrogate dataset that encapsulates rich information of the original large-scale one. However, previous distillation methods heavily rely on the sample-wise iterative optimization scheme. As the images-per-class (IPC) setting or image resolution grows larger, the necessary computation will demand overwhelming time and resources. In this work, we intend to incorporate generative diffusion techniques for computing the surrogate dataset. Observing that key factors for constructing an effective surrogate dataset are representativeness and diversity, we design additional minimax criteria in the generative training to enhance these facets for the generated images of diffusion models. We present a theoretical model of the process as hierarchical diffusion control demonstrating the flexibility of the diffusion process to target these criteria without jeopardizing the faithfulness of the sample to the desired distribution. The proposed method achieves state-of-the-art validation performance while demanding much less computational resources. Under the 100-IPC setting on ImageWoof, our method requires less than one-twentieth the distillation time of previous methods, yet yields even better performance. Source code and generated data are available in https://github.com/vimar-gu/MinimaxDiffusion.

Related papers

Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory [33.38900857290244]
We present a diversity-driven generative dataset distillation method based on a diffusion model to solve this problem.<n>We introduce self-adaptive memory to align the distribution between distilled and real datasets, assessing the representativeness.<n>Our method outperforms existing state-of-the-art methods in most situations.
arXiv Detail & Related papers (2025-05-26T03:48:56Z)
Generative Dataset Distillation using Min-Max Diffusion Model [11.433714787534582]
This paper addresses the problem of generative dataset distillation that utilizes generative models to synthesize images. We leverage the popular diffusion model as the generator to compute a surrogate dataset, boosted by a min-max loss to control the dataset's diversity and representativeness during training. We observe a critical trade-off between the number of image samples and the image quality controlled by the diffusion steps and propose Diffusion Step Reduction to achieve optimal performance.
arXiv Detail & Related papers (2025-03-24T12:41:40Z)
Efficient Dataset Distillation via Diffusion-Driven Patch Selection for Improved Generalization [34.53986517177061]
We propose a novel framework to existing diffusion-based distillation methods, leveraging diffusion models for selection rather than generation. Our method starts by predicting noise generated by the diffusion model based on input images and text prompts, then calculates the corresponding loss for each pair. This streamlined framework enables a single-step distillation process, and extensive experiments demonstrate that our approach outperforms state-of-the-art methods across various metrics.
arXiv Detail & Related papers (2024-12-13T08:34:46Z)
Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution'' We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z)
Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training. To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
One Step Diffusion-based Super-Resolution with Time-Aware Distillation [60.262651082672235]
Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. Recent techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowledge distillation. We propose a time-aware diffusion distillation method, named TAD-SR, to accomplish effective and efficient image super-resolution.
arXiv Detail & Related papers (2024-08-14T11:47:22Z)
EM Distillation for One-step Diffusion Models [65.57766773137068]
We propose a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of quality. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process.
arXiv Detail & Related papers (2024-05-27T05:55:22Z)
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation [61.03530321578825]
We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fr'echet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models.
arXiv Detail & Related papers (2024-04-05T12:30:19Z)
One Category One Prompt: Dataset Distillation using Diffusion Models [22.512552596310176]
We introduce Diffusion Models (D3M) as a novel paradigm for dataset distillation, leveraging recent advancements in generative text-to-image foundation models. Our approach utilizes textual inversion, a technique for fine-tuning text-to-image generative models, to create concise and informative representations for large datasets.
arXiv Detail & Related papers (2024-03-11T20:23:59Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.