Using generative adversarial networks to synthesize artificial financial
datasets
- URL: http://arxiv.org/abs/2002.02271v1
- Date: Thu, 6 Feb 2020 14:25:08 GMT
- Title: Using generative adversarial networks to synthesize artificial financial
datasets
- Authors: Dmitry Efimov, Di Xu, Luyang Kong, Alexey Nefedov and Archana
Anandakrishnan
- Abstract summary: We propose to use GANs to synthesize artificial financial data for research and benchmarking purposes.
We test this approach on three American Express datasets, and show that properly trained GANs can replicate these datasets with high fidelity.
- Score: 2.376767664163658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative Adversarial Networks (GANs) became very popular for generation of
realistically looking images. In this paper, we propose to use GANs to
synthesize artificial financial data for research and benchmarking purposes. We
test this approach on three American Express datasets, and show that properly
trained GANs can replicate these datasets with high fidelity. For our
experiments, we define a novel type of GAN, and suggest methods for data
preprocessing that allow good training and testing performance of GANs. We also
discuss methods for evaluating the quality of generated data, and their
comparison with the original real data.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Testing Deep Learning Recommender Systems Models on Synthetic GAN-Generated Datasets [0.27624021966289597]
The published method Generative Adversarial Networks for Recommender Systems (GANRS) allows generating data sets for collaborative filtering recommendation systems.
We have tested the GANRS method by creating multiple synthetic datasets from three different real datasets taken as a source.
We have also selected six state-of-the-art collaborative filtering deep learning models to test both their comparative performance and the GANRS method.
arXiv Detail & Related papers (2024-10-23T08:09:48Z) - Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion [20.352548473293993]
Face Recognition (FR) models are trained on large-scale datasets, which have privacy and ethical concerns.
Lately, the use of synthetic data to complement or replace genuine data for the training of FR models has been proposed.
We introduce a new method, inspired by the physical motion of soft particles subjected to Brownian forces, allowing us to sample identities in a latent space under various constraints.
With this in hands, we generate several face datasets and benchmark them by training FR models, showing that data generated with our method exceeds the performance of previously GAN-based datasets and achieves competitive performance with state-of-the-
arXiv Detail & Related papers (2024-04-30T22:32:02Z) - DSF-GAN: DownStream Feedback Generative Adversarial Network [0.07083082555458872]
We propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN)
DSF-GAN incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information.
Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback.
arXiv Detail & Related papers (2024-03-27T05:41:50Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Generative adversarial networks for data-scarce spectral applications [0.0]
We report on an application of GANs in the domain of synthetic spectral data generation.
We show that CWGANs can act as a surrogate model with improved performance in the low-data regime.
arXiv Detail & Related papers (2023-07-14T16:27:24Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - FairGen: Fair Synthetic Data Generation [0.3149883354098941]
We propose a pipeline to generate fairer synthetic data independent of the GAN architecture.
We claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples.
arXiv Detail & Related papers (2022-10-24T08:13:47Z) - Improving Generative Adversarial Networks with Local Coordinate Coding [150.24880482480455]
Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution.
In practice, semantic information might be represented by some latent distribution learned from data.
We propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
arXiv Detail & Related papers (2020-07-28T09:17:50Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.