Distribution-Aware Data Expansion with Diffusion Models
- URL: http://arxiv.org/abs/2403.06741v2
- Date: Wed, 5 Jun 2024 10:15:24 GMT
- Title: Distribution-Aware Data Expansion with Diffusion Models
- Authors: Haowei Zhu, Ling Yang, Jun-Hai Yong, Hongzhi Yin, Jiawei Jiang, Meng Xiao, Wentao Zhang, Bin Wang,
- Abstract summary: We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
- Score: 55.979857976023695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion techniques include image transformation and image synthesis methods. Transformation-based methods introduce only local variations, leading to limited diversity. In contrast, synthesis-based methods generate entirely new content, greatly enhancing informativeness. However, existing synthesis methods carry the risk of distribution deviations, potentially degrading model performance with out-of-distribution samples. In this paper, we propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff constructs hierarchical prototypes to approximate the real data distribution, optimizing latent data points within diffusion models with hierarchical energy guidance. We demonstrate its capability to generate distribution-consistent samples, significantly improving data expansion tasks. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data. Furthermore, our approach consistently outperforms existing synthesis-based techniques and demonstrates compatibility with widely adopted transformation-based augmentation methods. Additionally, the expanded dataset exhibits robustness across various architectural frameworks. Our code is available at https://github.com/haoweiz23/DistDiff
Related papers
- D$^4$M: Dataset Distillation via Disentangled Diffusion Model [4.568710926635445]
We propose an efficient framework for dataset distillation via Disentangled Diffusion Model (D$4$M)
Compared to architecture-dependent methods, D$4$M employs latent diffusion model to guarantee consistency and incorporates label information into category prototypes.
D$4$M demonstrates superior performance and robust generalization, surpassing the SOTA methods across most aspects.
arXiv Detail & Related papers (2024-07-21T12:16:20Z) - Expansive Synthesis: Generating Large-Scale Datasets from Minimal Samples [13.053285552524052]
This paper introduces an innovative Expansive Synthesis model that generates high-fidelity datasets from minimal samples.
We validate our Expansive Synthesis by training classifiers on the generated datasets and comparing their performance toversas trained on larger, original datasets.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Expanding Expressiveness of Diffusion Models with Limited Data via
Self-Distillation based Fine-Tuning [24.791783885165923]
Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness.
We propose Self-Distillation for Fine-Tuning diffusion models (SDFT) to address these challenges.
arXiv Detail & Related papers (2023-11-02T06:24:06Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Phoenix: A Federated Generative Diffusion Model [6.09170287691728]
Training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility.
This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using Federated Learning (FL) techniques.
arXiv Detail & Related papers (2023-06-07T01:43:09Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges.
Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models.
"phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Deep Variational Models for Collaborative Filtering-based Recommender
Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results.
Our proposed models apply the variational concept to injectity in the latent space of the deep architecture.
Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.