Related papers: Composable Generative Models

Composable Generative Models

URL: http://arxiv.org/abs/2102.09249v1
Date: Thu, 18 Feb 2021 10:11:29 GMT
Title: Composable Generative Models
Authors: Johan Leduc and Nicolas Grislain
Abstract summary: This paper focuses on synthetic data generation models with privacy preserving applications in mind. It introduces a novel architecture, the Composable Generative Model (CGM) The CGM has been evaluated on 13 datasets and compared to 14 recent generative models.
Score: 5.990174495635326
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative modeling has recently seen many exciting developments with the advent of deep generative architectures such as Variational Auto-Encoders (VAE) or Generative Adversarial Networks (GAN). The ability to draw synthetic i.i.d. observations with the same joint probability distribution as a given dataset has a wide range of applications including representation learning, compression or imputation. It appears that it also has many applications in privacy preserving data analysis, especially when used in conjunction with differential privacy techniques. This paper focuses on synthetic data generation models with privacy preserving applications in mind. It introduces a novel architecture, the Composable Generative Model (CGM) that is state-of-the-art in tabular data generation. Any conditional generative model can be used as a sub-component of the CGM, including CGMs themselves, allowing the generation of numerical, categorical data as well as images, text, or time series. The CGM has been evaluated on 13 datasets (6 standard datasets and 7 simulated) and compared to 14 recent generative models. It beats the state of the art in tabular data generation by a significant margin.

Related papers

Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable. We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z)
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data. CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z)
An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z)
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys) [57.30228361181045]
This survey connects key advancements in recommender systems using Generative Models (Gen-RecSys) It covers: interaction-driven generative models; the use of large language models (LLM) and textual data for natural language recommendation; and the integration of multimodal models for generating and processing images/videos in RS. Our work highlights necessary paradigms for evaluating the impact and harm of Gen-RecSys and identifies open challenges.
arXiv Detail & Related papers (2024-03-31T06:57:57Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Language Models are Realistic Tabular Data Generators [15.851912974874116]
We propose GReaT (Generation of Realistic Tabular data), which exploits an auto-regressive generative large language model (LLMs) to sample synthetic and yet highly realistic data. We demonstrate the effectiveness of the proposed approach in a series of experiments that quantify the validity and quality of the produced data samples from multiple angles.
arXiv Detail & Related papers (2022-10-12T15:03:28Z)
TTS-CGAN: A Transformer Time-Series Conditional GAN for Biosignal Data Augmentation [5.607676459156789]
We present TTS-CGAN, a conditional GAN model that can be trained on existing multi-class datasets and generate class-specific synthetic time-series sequences. Synthetic sequences generated by our model are indistinguishable from real ones, and can be used to complement or replace real signals of the same type.
arXiv Detail & Related papers (2022-06-28T01:01:34Z)
CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches. The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving. The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z)
DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data [0.0]
Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Deep learning models are data-driven and it is difficult to control the generation process. This article presents the Directed Acyclic Tabular GAN ( DATGAN) to address these limitations.
arXiv Detail & Related papers (2022-03-07T16:09:03Z)
InvGAN: Invertible GANs [88.58338626299837]
InvGAN, short for Invertible GAN, successfully embeds real images to the latent space of a high quality generative model. This allows us to perform image inpainting, merging, and online data augmentation.
arXiv Detail & Related papers (2021-12-08T21:39:00Z)
Differentially Private Synthetic Medical Data Generation using Convolutional GANs [7.2372051099165065]
We develop a differentially private framework for synthetic data generation using R'enyi differential privacy. Our approach builds on convolutional autoencoders and convolutional generative adversarial networks to preserve some of the critical characteristics of the generated synthetic data. We demonstrate that our model outperforms existing state-of-the-art models under the same privacy budget.
arXiv Detail & Related papers (2020-12-22T01:03:49Z)
Partially Conditioned Generative Adversarial Networks [75.08725392017698]
Generative Adversarial Networks (GANs) let one synthesise artificial datasets by implicitly modelling the underlying probability distribution of a real-world training dataset. With the introduction of Conditional GANs and their variants, these methods were extended to generating samples conditioned on ancillary information available for each sample within the dataset. In this work, we argue that standard Conditional GANs are not suitable for such a task and propose a new Adversarial Network architecture and training strategy.
arXiv Detail & Related papers (2020-07-06T15:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.