Sparse Data Generation Using Diffusion Models
- URL: http://arxiv.org/abs/2502.02448v1
- Date: Tue, 04 Feb 2025 16:14:28 GMT
- Title: Sparse Data Generation Using Diffusion Models
- Authors: Phil Ostheimer, Mayank Nagda, Marius Kloft, Sophie Fellenz,
- Abstract summary: We introduce Sparse Data Diffusion (SDD), a novel method for generating sparse data.
SDD achieves high fidelity in representing data sparsity while preserving the quality of the generated data.
- Score: 22.560860958917672
- License:
- Abstract: Sparse data is ubiquitous, appearing in numerous domains, from economics and recommender systems to astronomy and biomedical sciences. However, efficiently and realistically generating sparse data remains a significant challenge. We introduce Sparse Data Diffusion (SDD), a novel method for generating sparse data. SDD extends continuous state-space diffusion models by explicitly modeling sparsity through the introduction of Sparsity Bits. Empirical validation on image data from various domains-including two scientific applications, physics and biology-demonstrates that SDD achieves high fidelity in representing data sparsity while preserving the quality of the generated data.
Related papers
- A Novel Diffusion Model for Pairwise Geoscience Data Generation with Unbalanced Training Dataset [8.453075713579631]
We present UB-Diff'', a novel diffusion model for multi-modal paired scientific data generation.
One major innovation is a one-in-two-out encoder-decoder network structure, which can ensure pairwise data is obtained from a co-latent representation.
Experimental results on the OpenFWI dataset show that UB-Diff significantly outperforms existing techniques in terms of Fr'echet Inception Distance (FID) score and pairwise evaluation.
arXiv Detail & Related papers (2025-01-01T19:49:38Z) - Data Augmentation via Diffusion Model to Enhance AI Fairness [1.2979015577834876]
This paper explores the potential of diffusion models to generate synthetic data to improve AI fairness.
The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM) was utilized with different amounts of generated data for data augmentation.
Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.
arXiv Detail & Related papers (2024-10-20T18:52:31Z) - DreamDA: Generative Data Augmentation with Diffusion Models [68.22440150419003]
This paper proposes a new classification-oriented framework DreamDA.
DreamDA generates diverse samples that adhere to the original data distribution by considering training images in the original data as seeds.
In addition, since the labels of the generated data may not align with the labels of their corresponding seed images, we introduce a self-training paradigm for generating pseudo labels.
arXiv Detail & Related papers (2024-03-19T15:04:35Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - From Artificially Real to Real: Leveraging Pseudo Data from Large
Language Models for Low-Resource Molecule Discovery [35.5507452011217]
Cross-modal techniques for molecule discovery frequently encounter the issue of data scarcity, hampering their performance and application.
We introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data.
Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost.
arXiv Detail & Related papers (2023-09-11T02:35:36Z) - Deep Generative Modeling-based Data Augmentation with Demonstration
using the BFBT Benchmark Void Fraction Datasets [3.341975883864341]
This paper explores the applications of deep generative models (DGMs) that have been widely used for image data generation to scientific data augmentation.
Once trained, DGMs can be used to generate synthetic data that are similar to the training data and significantly expand the dataset size.
arXiv Detail & Related papers (2023-08-19T22:19:41Z) - Score-based Diffusion Models in Function Space [137.70916238028306]
Diffusion models have recently emerged as a powerful framework for generative modeling.
This work introduces a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - Diffusion Earth Mover's Distance and Distribution Embeddings [61.49248071384122]
Diffusion can be computed in $tildeO(n)$ time and is more accurate than similarly fast algorithms such as tree-baseds.
We show Diffusion is fully differentiable, making it amenable to future uses in gradient-descent frameworks such as deep neural networks.
arXiv Detail & Related papers (2021-02-25T13:18:32Z) - VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data [16.00692074660383]
VAEM is a deep generative model that is trained in a two stage manner.
We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
arXiv Detail & Related papers (2020-06-21T23:47:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.