Related papers: Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data

Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data

URL: http://arxiv.org/abs/2102.12679v1
Date: Thu, 25 Feb 2021 04:39:13 GMT
Title: Variational Selective Autoencoder: Learning from Partially-Observed Heterogeneous Data
Authors: Yu Gong and Hossein Hajimirsadeghi and Jiawei He and Thibaut Durand and Greg Mori
Abstract summary: We propose the variational selective autoencoder (VSAE) to learn representations from partially-observed heterogeneous data. VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask. It results in a unified model for various downstream tasks including data generation and imputation.
Score: 45.23338389559936
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning from heterogeneous data poses challenges such as combining data from various sources and of different types. Meanwhile, heterogeneous data are often associated with missingness in real-world applications due to heterogeneity and noise of input sources. In this work, we propose the variational selective autoencoder (VSAE), a general framework to learn representations from partially-observed heterogeneous data. VSAE learns the latent dependencies in heterogeneous data by modeling the joint distribution of observed data, unobserved data, and the imputation mask which represents how the data are missing. It results in a unified model for various downstream tasks including data generation and imputation. Evaluation on both low-dimensional and high-dimensional heterogeneous datasets for these two tasks shows improvement over state-of-the-art models.

Related papers

Causal Discovery on Dependent Binary Data [6.464898093190062]
We propose a decorrelation-based approach for causal graph learning on dependent binary data. We develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables. We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning.
arXiv Detail & Related papers (2024-12-28T21:55:42Z)
Integrating Random Effects in Variational Autoencoders for Dimensionality Reduction of Correlated Data [9.990687944474738]
LMMVAE is a novel model which separates the classic VAE latent model into fixed and random parts. It is shown to improve squared reconstruction error and negative likelihood loss significantly on unseen data.
arXiv Detail & Related papers (2024-12-22T07:20:17Z)
Learning Divergence Fields for Shift-Robust Graph Representations [73.11818515795761]
In this work, we propose a geometric diffusion model with learnable divergence fields for the challenging problem with interdependent data. We derive a new learning objective through causal inference, which can guide the model to learn generalizable patterns of interdependence that are insensitive across domains.
arXiv Detail & Related papers (2024-06-07T14:29:21Z)
An improved tabular data generator with VAE-GMM integration [9.4491536689161]
We propose a novel Variational Autoencoder (VAE)-based model that addresses limitations of current approaches. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. We thoroughly validate our model on three real-world datasets with mixed data types, including two medically relevant ones.
arXiv Detail & Related papers (2024-04-12T12:31:06Z)
Fake It Till Make It: Federated Learning with Consensus-Oriented Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG) FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training. Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Scalable Regularised Joint Mixture Models [2.0686407686198263]
In many applications, data can be heterogeneous in the sense of spanning latent groups with different underlying distributions. We propose an approach for heterogeneous data that allows joint learning of (i) explicit multivariate feature distributions, (ii) high-dimensional regression models and (iii) latent group labels. The approach is demonstrably effective in high dimensions, combining data reduction for computational efficiency with a re-weighting scheme that retains key signals even when the number of features is large.
arXiv Detail & Related papers (2022-05-03T13:38:58Z)
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z)
Bayesian data combination model with Gaussian process latent variable model for mixed observed variables under NMAR missingness [0.0]
It is difficult to obtain a "(quasi) single-source dataset" in which the variables of interest are simultaneously observed. It is necessary to utilize these datasets as a single-source dataset with missing variables. We propose a data fusion method that does not assume that datasets are homogenous.
arXiv Detail & Related papers (2021-09-01T16:09:55Z)
Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion. We propose a general framework that combines disparate data types through the exponential family of distributions. The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z)
VAEM: a Deep Generative Model for Heterogeneous Mixed Type Data [16.00692074660383]
VAEM is a deep generative model that is trained in a two stage manner. We show that VAEM broadens the range of real-world applications where deep generative models can be successfully deployed.
arXiv Detail & Related papers (2020-06-21T23:47:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.