VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
- URL: http://arxiv.org/abs/2006.11941v1
- Date: Sun, 21 Jun 2020 23:47:32 GMT
- Title: VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data
- Authors: Chao Ma, Sebastian Tschiatschek, Jos\'e Miguel Hern\'andez-Lobato,
Richard Turner, Cheng Zhang
- Abstract summary: VAEM is a deep generative model that is trained in a two stage manner.
We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
- Score: 16.00692074660383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models often perform poorly in real-world applications due to
the heterogeneity of natural data sets. Heterogeneity arises from data
containing different types of features (categorical, ordinal, continuous, etc.)
and features of the same type having different marginal distributions. We
propose an extension of variational autoencoders (VAEs) called VAEM to handle
such heterogeneous data. VAEM is a deep generative model that is trained in a
two stage manner such that the first stage provides a more uniform
representation of the data to the second stage, thereby sidestepping the
problems caused by heterogeneous data. We provide extensions of VAEM to handle
partially observed data, and demonstrate its performance in data generation,
missing data prediction and sequential feature selection tasks. Our results
show that VAEM broadens the range of real-world applications where deep
generative models can be successfully deployed.
Related papers
- DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data [48.31817189858086]
We argue that generative data can expand the data distribution that the model can learn, thus mitigating overfitting.
We find that DiverGen significantly outperforms the strong model X-Paste, achieving +1.1 box AP and +1.1 mask AP across all categories, and +1.9 box AP and +2.5 mask AP for rare categories.
arXiv Detail & Related papers (2024-05-16T15:30:18Z) - An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches.
Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture.
We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Heterogeneous Multi-Task Gaussian Cox Processes [61.67344039414193]
We present a novel extension of multi-task Gaussian Cox processes for modeling heterogeneous correlated tasks jointly.
A MOGP prior over the parameters of the dedicated likelihoods for classification, regression and point process tasks can facilitate sharing of information between heterogeneous tasks.
We derive a mean-field approximation to realize closed-form iterative updates for estimating model parameters.
arXiv Detail & Related papers (2023-08-29T15:01:01Z) - Improving Out-of-Distribution Robustness of Classifiers via Generative
Interpolation [56.620403243640396]
Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data.
However, their performance deteriorates significantly when handling out-of-distribution (OoD) data.
We develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples.
arXiv Detail & Related papers (2023-07-23T03:53:53Z) - VTAE: Variational Transformer Autoencoder with Manifolds Learning [144.0546653941249]
Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables.
The nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning.
We show that geodesics and accurate computation can substantially improve the performance of deep generative models.
arXiv Detail & Related papers (2023-04-03T13:13:19Z) - A Variational Autoencoder for Heterogeneous Temporal and Longitudinal
Data [0.3749861135832073]
Recently proposed extensions to VAEs that can handle temporal and longitudinal data have applications in healthcare, behavioural modelling, and predictive maintenance.
We propose the heterogeneous longitudinal VAE (HL-VAE) that extends the existing temporal and longitudinal VAEs to heterogeneous data.
HL-VAE provides efficient inference for high-dimensional datasets and includes likelihood models for continuous, count, categorical, and ordinal data.
arXiv Detail & Related papers (2022-04-20T10:18:39Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Adapting deep generative approaches for getting synthetic data with
realistic marginal distributions [0.0]
Deep generative models, such as variational autoencoders (VAEs), are a popular approach for creating such synthetic datasets from original data.
We propose a novel method, pre-transformation variational autoencoders (PTVAEs), to specifically address bimodal and skewed data.
The results show that the PTVAE approach can outperform others in both bimodal and skewed data generation.
arXiv Detail & Related papers (2021-05-14T15:47:20Z) - Variational Selective Autoencoder: Learning from Partially-Observed
Heterogeneous Data [45.23338389559936]
We propose the variational selective autoencoder (VSAE) to learn representations from partially-observed heterogeneous data.
VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask.
It results in a unified model for various downstream tasks including data generation and imputation.
arXiv Detail & Related papers (2021-02-25T04:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.