CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular
Synthesis
- URL: http://arxiv.org/abs/2304.12654v2
- Date: Thu, 21 Sep 2023 13:40:53 GMT
- Title: CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular
Synthesis
- Authors: Chaejeong Lee, Jayoung Kim, Noseong Park
- Abstract summary: We propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models.
The two diffusion models are co-evolved during training by reading conditions from each other.
In our experiments with 11 real-world datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
- Score: 28.460781361829326
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With growing attention to tabular data these days, the attempt to apply a
synthetic table to various tasks has been expanded toward various scenarios.
Owing to the recent advances in generative modeling, fake data generated by
tabular data synthesis models become sophisticated and realistic. However,
there still exists a difficulty in modeling discrete variables (columns) of
tabular data. In this work, we propose to process continuous and discrete
variables separately (but being conditioned on each other) by two diffusion
models. The two diffusion models are co-evolved during training by reading
conditions from each other. In order to further bind the diffusion models,
moreover, we introduce a contrastive learning method with a negative sampling
method. In our experiments with 11 real-world tabular datasets and 8 baseline
methods, we prove the efficacy of the proposed method, called CoDi.
Related papers
- Continuous Diffusion Model for Language Modeling [57.396578974401734]
Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches.
We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
arXiv Detail & Related papers (2025-02-17T08:54:29Z) - Understanding and Mitigating Memorization in Diffusion Models for Tabular Data [16.02060275534452]
memorization occurs when models inadvertently replicate exact or near-identical training data.
We propose TabCutMix, a simple yet effective data augmentation technique.
We show that TabCutMix effectively mitigates memorization while maintaining high-quality data generation.
arXiv Detail & Related papers (2024-12-15T04:04:37Z) - TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation [91.50296404732902]
We introduce TabDiff, a joint diffusion framework that models all mixed-type distributions of tabular data in one model.
Our key innovation is the development of a joint continuous-time diffusion process for numerical and categorical data.
TabDiff achieves superior average performance over existing competitive baselines, with up to $22.5%$ improvement over the state-of-the-art model on pair-wise column correlation estimations.
arXiv Detail & Related papers (2024-10-27T22:58:47Z) - Discrete Copula Diffusion [44.96934660818884]
We identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps.
We introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model.
Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps.
arXiv Detail & Related papers (2024-10-02T18:51:38Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - Balanced Mixed-Type Tabular Data Synthesis with Diffusion Models [14.651592234678722]
Current diffusion models tend to inherit bias in the training dataset and generate biased synthetic data.
We introduce a novel model that incorporates sensitive guidance to generate fair synthetic data with balanced joint distributions of the target label and sensitive attributes.
Our method effectively mitigates bias in training data while maintaining the quality of the generated samples.
arXiv Detail & Related papers (2024-04-12T06:08:43Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - OCD: Learning to Overfit with Conditional Diffusion Models [95.1828574518325]
We present a dynamic model in which the weights are conditioned on an input sample x.
We learn to match those weights that would be obtained by finetuning a base model on x and its label y.
arXiv Detail & Related papers (2022-10-02T09:42:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.