Copula Flows for Synthetic Data Generation
- URL: http://arxiv.org/abs/2101.00598v1
- Date: Sun, 3 Jan 2021 10:06:23 GMT
- Title: Copula Flows for Synthetic Data Generation
- Authors: Sanket Kamthe, Samuel Assefa, Marc Deisenroth
- Abstract summary: We propose to use a probabilistic model as a synthetic data generator.
We benchmark our method on both simulated and real data-sets in terms of density estimation.
- Score: 0.5801044612920815
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ability to generate high-fidelity synthetic data is crucial when
available (real) data is limited or where privacy and data protection standards
allow only for limited use of the given data, e.g., in medical and financial
data-sets. Current state-of-the-art methods for synthetic data generation are
based on generative models, such as Generative Adversarial Networks (GANs).
Even though GANs have achieved remarkable results in synthetic data generation,
they are often challenging to interpret.Furthermore, GAN-based methods can
suffer when used with mixed real and categorical variables.Moreover, loss
function (discriminator loss) design itself is problem specific, i.e., the
generative model may not be useful for tasks it was not explicitly trained for.
In this paper, we propose to use a probabilistic model as a synthetic data
generator. Learning the probabilistic model for the data is equivalent to
estimating the density of the data. Based on the copula theory, we divide the
density estimation task into two parts, i.e., estimating univariate marginals
and estimating the multivariate copula density over the univariate marginals.
We use normalising flows to learn both the copula density and univariate
marginals. We benchmark our method on both simulated and real data-sets in
terms of density estimation as well as the ability to generate high-fidelity
synthetic data
Related papers
- Marginal Causal Flows for Validation and Inference [3.547529079746247]
Investigating the marginal causal effect of an intervention on an outcome from complex data remains challenging.
We introduce Frugal Flows, a novel likelihood-based machine learning model that uses normalising flows to flexibly learn the data-generating process.
We demonstrate the above with experiments on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-11-02T16:04:57Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Generative Modeling for Tabular Data via Penalized Optimal Transport
Network [2.0319002824093015]
Wasserstein generative adversarial network (WGAN) is a notable improvement in generative modeling.
We propose POTNet, a generative deep neural network based on a novel, robust, and interpretable marginally-penalized Wasserstein (MPW) loss.
arXiv Detail & Related papers (2024-02-16T05:27:05Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Improving the quality of generative models through Smirnov
transformation [1.3492000366723798]
We propose a novel activation function to be used as output of the generator agent.
It is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data.
arXiv Detail & Related papers (2021-10-29T17:01:06Z) - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks [71.6879432974126]
We introduce DECAF: a GAN-based fair synthetic data generator for tabular data.
We show that DECAF successfully removes undesired bias and is capable of generating high-quality synthetic data.
We provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
arXiv Detail & Related papers (2021-10-25T12:39:56Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - Causal-TGAN: Generating Tabular Data Using Causal Generative Adversarial
Networks [7.232789848964222]
We propose a causal model named Causal Tabular Generative Neural Network (Causal-TGAN) to generate synthetic data.
Experiments on both simulated datasets and real datasets demonstrate the better performance of our method.
arXiv Detail & Related papers (2021-04-21T17:59:41Z) - TraDE: Transformers for Density Estimation [101.20137732920718]
TraDE is a self-attention-based architecture for auto-regressive density estimation.
We present a suite of tasks such as regression using generated samples, out-of-distribution detection, and robustness to noise in the training data.
arXiv Detail & Related papers (2020-04-06T07:32:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.