Generative Adversarial Networks for Synthetic Data Generation: A
Comparative Study
- URL: http://arxiv.org/abs/2112.01925v1
- Date: Fri, 3 Dec 2021 14:23:17 GMT
- Title: Generative Adversarial Networks for Synthetic Data Generation: A
Comparative Study
- Authors: Claire Little, Mark Elliot, Richard Allmendinger, Sahel Shariati
Samani
- Abstract summary: Generative Adversarial Networks (GANs) are gaining increasing attention as a means for synthesising data.
Here we consider the potential application of GANs for the purpose of generating synthetic census microdata.
- Score: 1.0896567381206714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative Adversarial Networks (GANs) are gaining increasing attention as a
means for synthesising data. So far much of this work has been applied to use
cases outside of the data confidentiality domain with a common application
being the production of artificial images. Here we consider the potential
application of GANs for the purpose of generating synthetic census microdata.
We employ a battery of utility metrics and a disclosure risk metric (the
Targeted Correct Attribution Probability) to compare the data produced by
tabular GANs with those produced using orthodox data synthesis methods.
Related papers
- A Survey on Tabular Data Generation: Utility, Alignment, Fidelity, Privacy, and Beyond [53.56796220109518]
Different use cases demand synthetic data to comply with different requirements to be useful in practice.
Four types of requirements are reviewed: utility of the synthetic data, alignment of the synthetic data with domain-specific knowledge, statistical fidelity of the synthetic data distribution compared to the real data distribution, and privacy-preserving capabilities.
We discuss future directions for the field, along with opportunities to improve the current evaluation methods.
arXiv Detail & Related papers (2025-03-07T21:47:11Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular
Data Synthesis [0.4999814847776097]
Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data.
The validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed.
arXiv Detail & Related papers (2023-07-01T16:52:18Z) - Generative Adversarial Networks for Data Augmentation [0.0]
GANs have been utilized in medical image analysis for various tasks, including data augmentation, image creation, and domain adaptation.
GANs can generate synthetic samples that can be used to increase the available dataset.
It is essential to note that the use of GANs in medical imaging is still an active area of research to ensure that the produced images are of high quality and suitable for use in clinical settings.
arXiv Detail & Related papers (2023-06-03T06:33:33Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Distributed Conditional GAN (discGAN) For Synthetic Healthcare Data
Generation [0.0]
We propose a distributed Generative Adversarial Networks (discGANs) to generate synthetic data specific to the healthcare domain.
We generated 249,000 synthetic records from original 2,027 eICU dataset.
Our results show that discGAN was able to generate data with distributions similar to the real data.
arXiv Detail & Related papers (2023-04-09T18:35:05Z) - Generating Realistic Synthetic Relational Data through Graph Variational
Autoencoders [47.89542334125886]
We combine the variational autoencoder framework with graph neural networks to generate realistic synthetic relational databases.
The results indicate that real databases' structures are accurately preserved in the resulting synthetic datasets.
arXiv Detail & Related papers (2022-11-30T10:40:44Z) - Sequential IoT Data Augmentation using Generative Adversarial Networks [5.8010446129208155]
Sequential data in industrial applications can be used to train and evaluate machine learning models.
Since gathering representative amounts of data is difficult and time consuming, there is an incentive to generate it from a small ground truth.
This paper investigates the possibility of using GANs in order to augment sequential Internet of Things (IoT) data.
arXiv Detail & Related papers (2021-01-13T11:08:07Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z) - Using generative adversarial networks to synthesize artificial financial
datasets [2.376767664163658]
We propose to use GANs to synthesize artificial financial data for research and benchmarking purposes.
We test this approach on three American Express datasets, and show that properly trained GANs can replicate these datasets with high fidelity.
arXiv Detail & Related papers (2020-02-06T14:25:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.