Related papers: Distribution-Aware Data Expansion with Diffusion Models

Distribution-Aware Data Expansion with Diffusion Models

URL: http://arxiv.org/abs/2403.06741v2
Date: Wed, 5 Jun 2024 10:15:24 GMT
Title: Distribution-Aware Data Expansion with Diffusion Models
Authors: Haowei Zhu, Ling Yang, Jun-Hai Yong, Hongzhi Yin, Jiawei Jiang, Meng Xiao, Wentao Zhang, Bin Wang,
Abstract summary: We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
Score: 55.979857976023695
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion techniques include image transformation and image synthesis methods. Transformation-based methods introduce only local variations, leading to limited diversity. In contrast, synthesis-based methods generate entirely new content, greatly enhancing informativeness. However, existing synthesis methods carry the risk of distribution deviations, potentially degrading model performance with out-of-distribution samples. In this paper, we propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model. DistDiff constructs hierarchical prototypes to approximate the real data distribution, optimizing latent data points within diffusion models with hierarchical energy guidance. We demonstrate its capability to generate distribution-consistent samples, significantly improving data expansion tasks. DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data. Furthermore, our approach consistently outperforms existing synthesis-based techniques and demonstrates compatibility with widely adopted transformation-based augmentation methods. Additionally, the expanded dataset exhibits robustness across various architectural frameworks. Our code is available at https://github.com/haoweiz23/DistDiff

Related papers

Dataset Distillation with Probabilistic Latent Features [9.318549327568695]
A compact set of synthetic data can effectively replace the original dataset in downstream classification tasks.<n>We propose a novel approach that models the joint distribution of latent features.<n>Our method achieves state-of-the-art cross architecture performance across a range of backbone architectures.
arXiv Detail & Related papers (2025-05-10T13:53:49Z)
Boosting Statistic Learning with Synthetic Data from Pretrained Large Models [6.596689283714373]
We propose a novel end-to-end framework that generates and systematically filters synthetic data through domain-specific statistical methods.<n>Our experiments demonstrate consistent improvements in predictive performance across various settings.
arXiv Detail & Related papers (2025-05-08T06:55:22Z)
Graph Representation Learning with Diffusion Generative Models [0.0]
We train a discrete diffusion model within an autoencoder framework to learn meaningful embeddings for graph data. Our approach demonstrates the potential of discrete diffusion models to be used for graph representation learning.
arXiv Detail & Related papers (2025-01-22T07:12:10Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Erase, then Redraw: A Novel Data Augmentation Approach for Free Space Detection Using Diffusion Model [5.57325257338134]
Traditional data augmentation methods cannot alter high-level semantic attributes. We propose a text-to-image diffusion model to parameterize image-to-image transformations. We achieve this goal by erasing instances of real objects from the original dataset and generating new instances with similar semantics in the erased regions.
arXiv Detail & Related papers (2024-09-30T10:21:54Z)
A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance. We propose a simple yet effective data augmentation approach by leveraging advancements in generative models. Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z)
Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples. An autoencoder with self-attention layers and optimal transport refines distributional consistency. Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z)
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG) FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training. Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z)
Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data. We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z)
Phoenix: A Federated Generative Diffusion Model [6.09170287691728]
Training generative models on large centralized datasets can pose challenges in terms of data privacy, security, and accessibility. This paper proposes a novel method for training a Denoising Diffusion Probabilistic Model (DDPM) across multiple data sources using Federated Learning (FL) techniques.
arXiv Detail & Related papers (2023-06-07T01:43:09Z)
Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges. Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models. "phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z)
Deep Variational Models for Collaborative Filtering-based Recommender Systems [63.995130144110156]
Deep learning provides accurate collaborative filtering models to improve recommender system results. Our proposed models apply the variational concept to injectity in the latent space of the deep architecture. Results show the superiority of the proposed approach in scenarios where the variational enrichment exceeds the injected noise effect.
arXiv Detail & Related papers (2021-07-27T08:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.