Improving the quality of generative models through Smirnov
transformation
- URL: http://arxiv.org/abs/2110.15914v1
- Date: Fri, 29 Oct 2021 17:01:06 GMT
- Title: Improving the quality of generative models through Smirnov
transformation
- Authors: \'Angel Gonz\'alez-Prieto, Alberto Mozo, Sandra G\'omez-Canaval, Edgar
Talavera
- Abstract summary: We propose a novel activation function to be used as output of the generator agent.
It is based on the Smirnov probabilistic transformation and it is specifically designed to improve the quality of the generated data.
- Score: 1.3492000366723798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving the convergence issues of Generative Adversarial Networks (GANs) is
one of the most outstanding problems in generative models. In this work, we
propose a novel activation function to be used as output of the generator
agent. This activation function is based on the Smirnov probabilistic
transformation and it is specifically designed to improve the quality of the
generated data. In sharp contrast with previous works, our activation function
provides a more general approach that deals not only with the replication of
categorical variables but with any type of data distribution (continuous or
discrete). Moreover, our activation function is derivable and therefore, it can
be seamlessly integrated in the backpropagation computations during the GAN
training processes. To validate this approach, we evaluate our proposal against
two different data sets: a) an artificially rendered data set containing a
mixture of discrete and continuous variables, and b) a real data set of
flow-based network traffic data containing both normal connections and
cryptomining attacks. To evaluate the fidelity of the generated data, we
analyze both their results in terms of quality measures of statistical nature
and also regarding the use of these synthetic data to feed a nested machine
learning-based classifier. The experimental results evince a clear
outperformance of the GAN network tuned with this new activation function with
respect to both a na\"ive mean-based generator and a standard GAN. The quality
of the data is so high that the generated data can fully substitute real data
for training the nested classifier without a fall in the obtained accuracy.
This result encourages the use of GANs to produce high-quality synthetic data
that are applicable in scenarios in which data privacy must be guaranteed.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - A Kernelised Stein Statistic for Assessing Implicit Generative Models [10.616967871198689]
We propose a principled procedure to assess the quality of a synthetic data generator.
The sample size from the synthetic data generator can be as large as desired, while the size of the observed data, which the generator aims to emulate is fixed.
arXiv Detail & Related papers (2022-05-31T23:40:21Z) - Improving Model Compatibility of Generative Adversarial Networks by
Boundary Calibration [24.28407308818025]
Boundary-Calibration GANs (BCGANs) are proposed to improve GAN's model compatibility.
BCGANs generate realistic images like original GANs but also achieves superior model compatibility than the original GANs.
arXiv Detail & Related papers (2021-11-03T16:08:09Z) - Copula Flows for Synthetic Data Generation [0.5801044612920815]
We propose to use a probabilistic model as a synthetic data generator.
We benchmark our method on both simulated and real data-sets in terms of density estimation.
arXiv Detail & Related papers (2021-01-03T10:06:23Z) - Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset.
With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset.
In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.