Taming Diffusion for Dataset Distillation with High Representativeness
- URL: http://arxiv.org/abs/2505.18399v1
- Date: Fri, 23 May 2025 22:05:59 GMT
- Title: Taming Diffusion for Dataset Distillation with High Representativeness
- Authors: Lin Zhao, Yushu Wu, Xinru Jiang, Jianyang Gu, Yanzhi Wang, Xiaolin Xu, Pu Zhao, Xue Lin,
- Abstract summary: D3HR is a novel diffusion-based framework to generate distilled datasets with high representativeness.<n>Our experiments demonstrate that D3HR can achieve higher accuracy across different model architectures.
- Score: 49.3818035378669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep learning models demand larger datasets, driving the need for dataset distillation to create compact, cost-efficient datasets while maintaining performance. Due to the powerful image generation capability of diffusion, it has been introduced to this field for generating distilled images. In this paper, we systematically investigate issues present in current diffusion-based dataset distillation methods, including inaccurate distribution matching, distribution deviation with random noise, and separate sampling. Building on this, we propose D^3HR, a novel diffusion-based framework to generate distilled datasets with high representativeness. Specifically, we adopt DDIM inversion to map the latents of the full dataset from a low-normality latent domain to a high-normality Gaussian domain, preserving information and ensuring structural consistency to generate representative latents for the distilled dataset. Furthermore, we propose an efficient sampling scheme to better align the representative latents with the high-normality Gaussian distribution. Our comprehensive experiments demonstrate that D^3HR can achieve higher accuracy across different model architectures compared with state-of-the-art baselines in dataset distillation. Source code: https://github.com/lin-zhao-resoLve/D3HR.
Related papers
- Multimodal Atmospheric Super-Resolution With Deep Generative Models [0.0]
Score-based diffusion modeling is a generative machine learning algorithm that can be used to sample from complex distributions.<n>In this article, we apply such a concept to the super-resolution of a high-dimensional dynamical system, given the real-time availability of low-resolution and experimentally observed sparse sensor measurements.
arXiv Detail & Related papers (2025-06-28T06:47:09Z) - Diversity-Driven Generative Dataset Distillation Based on Diffusion Model with Self-Adaptive Memory [33.38900857290244]
We present a diversity-driven generative dataset distillation method based on a diffusion model to solve this problem.<n>We introduce self-adaptive memory to align the distribution between distilled and real datasets, assessing the representativeness.<n>Our method outperforms existing state-of-the-art methods in most situations.
arXiv Detail & Related papers (2025-05-26T03:48:56Z) - Dataset Distillation of 3D Point Clouds via Distribution Matching [10.294595110092407]
We propose a distribution matching-based distillation framework for 3D point clouds.<n>To address the semantic misalignment caused by unordered indexing of points, we introduce a Semantically Aligned Distribution Matching loss.<n>To address the rotation variation, we jointly learn the optimal rotation angles while updating the synthetic dataset to better align with the original feature distribution.
arXiv Detail & Related papers (2025-03-28T05:15:22Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.<n>DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - Generative Dataset Distillation Based on Self-knowledge Distillation [49.20086587208214]
We present a novel generative dataset distillation method that can improve the accuracy of aligning prediction logits.<n>Our approach integrates self-knowledge distillation to achieve more precise distribution matching between the synthetic and original data.<n>Our method outperforms existing state-of-the-art methods, resulting in superior distillation performance.
arXiv Detail & Related papers (2025-01-08T00:43:31Z) - FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation [31.06080108012735]
We propose an efficient Monocular Depth Estimation (MDE) approach named FiffDepth.<n>FiffDepth transforms diffusion-based image generators into a feed-forward architecture for detailed depth estimation.<n>We demonstrate that FiffDepth achieves exceptional accuracy, stability, and fine-grained detail, offering significant improvements in MDE performance.
arXiv Detail & Related papers (2024-12-01T04:59:34Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - D$^4$M: Dataset Distillation via Disentangled Diffusion Model [4.568710926635445]
We propose an efficient framework for dataset distillation via Disentangled Diffusion Model (D$4$M)
Compared to architecture-dependent methods, D$4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes.
D$4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.
arXiv Detail & Related papers (2024-07-21T12:16:20Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Latent Dataset Distillation with Diffusion Models [9.398135472047132]
This paper proposes Latent dataset Distillation with Diffusion Models (LD3M)
Our novel diffusion process is tailored for this task and significantly improves the gradient flow for distillation.
Overall, LD3M consistently outperforms state-of-the-art methods by up to 4.8 p.p. and 4.2 p.p. for 1 and 10 images per class, respectively.
arXiv Detail & Related papers (2024-03-06T17:41:41Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models [1.1965844936801797]
Generative modeling of 3D LiDAR data is an emerging task with promising applications for autonomous mobile robots.
We present R2DM, a novel generative model for LiDAR data that can generate diverse and high-fidelity 3D scene point clouds.
Our method is built upon denoising diffusion probabilistic models (DDPMs), which have shown impressive results among generative model frameworks.
arXiv Detail & Related papers (2023-09-17T12:26:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.