Variational Selective Autoencoder: Learning from Partially-Observed
Heterogeneous Data
- URL: http://arxiv.org/abs/2102.12679v1
- Date: Thu, 25 Feb 2021 04:39:13 GMT
- Title: Variational Selective Autoencoder: Learning from Partially-Observed
Heterogeneous Data
- Authors: Yu Gong and Hossein Hajimirsadeghi and Jiawei He and Thibaut Durand
and Greg Mori
- Abstract summary: We propose the variational selective autoencoder (VSAE) to learn representations from partially-observed heterogeneous data.
VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask.
It results in a unified model for various downstream tasks including data generation and imputation.
- Score: 45.23338389559936
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from heterogeneous data poses challenges such as combining data from
various sources and of different types. Meanwhile, heterogeneous data are often
associated with missingness in real-world applications due to heterogeneity and
noise of input sources. In this work, we propose the variational selective
autoencoder (VSAE), a general framework to learn representations from
partially-observed heterogeneous data. VSAE learns the latent dependencies in
heterogeneous data by modeling the joint distribution of observed data,
unobserved data, and the imputation mask which represents how the data are
missing. It results in a unified model for various downstream tasks including
data generation and imputation. Evaluation on both low-dimensional and
high-dimensional heterogeneous datasets for these two tasks shows improvement
over state-of-the-art models.
Related papers
- Causal Discovery on Dependent Binary Data [6.464898093190062]
We propose a decorrelation-based approach for causal graph learning on dependent binary data.
We develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables.
We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning.
arXiv Detail & Related papers (2024-12-28T21:55:42Z) - Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data [9.990687944474738]
LMMVAE is a novel model which separates the classic VAE latent model into fixed and random parts.
It is shown to improve squared reconstruction error and negative likelihood loss significantly on unseen data.
arXiv Detail & Related papers (2024-12-22T07:20:17Z) - Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data.
We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Bayesian data combination model with Gaussian process latent variable
model for mixed observed variables under NMAR missingness [0.0]
It is difficult to obtain a "(quasi) single-source dataset" in which the variables of interest are simultaneously observed.
It is necessary to utilize these datasets as a single-source dataset with missing variables.
We propose a data fusion method that does not assume that datasets are homogenous.
arXiv Detail & Related papers (2021-09-01T16:09:55Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data [16.00692074660383]
VAEM is a deep generative model that is trained in a two stage manner.
We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
arXiv Detail & Related papers (2020-06-21T23:47:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.