Related papers: CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis

URL: http://arxiv.org/abs/2304.12654v2
Date: Thu, 21 Sep 2023 13:40:53 GMT
Title: CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis
Authors: Chaejeong Lee, Jayoung Kim, Noseong Park
Abstract summary: We propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In our experiments with 11 real-world datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
Score: 28.460781361829326
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.

Related papers

CtrTab: Tabular Data Synthesis with High-Dimensional and Limited Data [16.166752861658953]
When the data dimensionality increases, existing models tend to degenerate and may perform even worse than simpler, non-diffusion-based models. This is because limited training samples in high-dimensional space often hinder generative models from capturing the distribution accurately. We propose CtrTab to improve the performance of diffusion-based generative models in high-dimensional, low-data scenarios.
arXiv Detail & Related papers (2025-03-09T05:01:56Z)
Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches. We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z)
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data [16.02060275534452]
memorization occurs when models inadvertently replicate exact or near-identical training data. We propose TabCutMix, a simple yet effective data augmentation technique. We show that TabCutMix effectively mitigates memorization while maintaining high-quality data generation.
arXiv Detail & Related papers (2024-12-15T04:04:37Z)
TabDiff: a Multi-Modal Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all multi-modal distributions of tabular data in one model. Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data. TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z)
Discrete Copula Diffusion [44.96934660818884]
We identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps. We introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps.
arXiv Detail & Related papers (2024-10-02T18:51:38Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved. This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z)
Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models [14.651592234678722]
Current diffusion models tend to inherit bias in the training dataset and generate biased synthetic data. We introduce a novel model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes. Our method effectively mitigates bias in training data while maintaining the quality of the generated samples.
arXiv Detail & Related papers (2024-04-12T06:08:43Z)
Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z)
Lecture Notes in Probabilistic Diffusion Models [0.5361320134021585]
Diffusion models are loosely modelled based on non-equilibrium thermodynamics. The diffusion model learns the data manifold to which the original and thus the reconstructed data samples belong. Diffusion models have -- unlike variational autoencoder and flow models -- latent variables with the same dimensionality as the original data.
arXiv Detail & Related papers (2023-12-16T09:36:54Z)
MissDiff: Training Diffusion Models on Tabular Data with Missing Values [29.894691645801597]
This work presents a unified and principled diffusion-based framework for learning from data with missing values. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases.
arXiv Detail & Related papers (2023-07-02T03:49:47Z)
Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs) Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z)
OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x. We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.