Disentangling shared and private latent factors in multimodal
Variational Autoencoders
- URL: http://arxiv.org/abs/2403.06338v1
- Date: Sun, 10 Mar 2024 23:11:05 GMT
- Title: Disentangling shared and private latent factors in multimodal
Variational Autoencoders
- Authors: Kaspar M\"artens and Christopher Yau
- Abstract summary: Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private.
We demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation.
Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.
- Score: 6.680930089714339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models for multimodal data permit the identification of latent
factors that may be associated with important determinants of observed data
heterogeneity. Common or shared factors could be important for explaining
variation across modalities whereas other factors may be private and important
only for the explanation of a single modality. Multimodal Variational
Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those
underlying latent factors and separating shared variation from private. In this
work, we investigate their capability to reliably perform this disentanglement.
In particular, we highlight a challenging problem setting where
modality-specific variation dominates the shared signal. Taking a cross-modal
prediction perspective, we demonstrate limitations of existing models, and
propose a modification how to make them more robust to modality-specific
variation. Our findings are supported by experiments on synthetic as well as
various real-world multi-omics data sets.
Related papers
- Bayesian Joint Additive Factor Models for Multiview Learning [7.254731344123118]
A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes.
We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components.
Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors.
arXiv Detail & Related papers (2024-06-02T15:35:45Z) - From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals [27.95734153126108]
Existing methods for representation learning aim to disentangle the modality-shared and modality-specific latent variables.
We propose a general generation process, where the modality-shared and modality-specific latent variables are dependent.
Our textbfMATE model is built on a temporally variational inference architecture with the modality-shared and modality-specific prior networks.
arXiv Detail & Related papers (2024-05-25T06:26:02Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research.
In this work, we consider a variational objective that can tightly approximate the data log-likelihood.
We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches.
arXiv Detail & Related papers (2023-09-01T10:32:21Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Identifiable Latent Causal Content for Domain Adaptation under Latent Covariate Shift [82.14087963690561]
Multi-source domain adaptation (MSDA) addresses the challenge of learning a label prediction function for an unlabeled target domain.
We present an intricate causal generative model by introducing latent noises across domains, along with a latent content variable and a latent style variable.
The proposed approach showcases exceptional performance and efficacy on both simulated and real-world datasets.
arXiv Detail & Related papers (2022-08-30T11:25:15Z) - Mixture-of-experts VAEs can disregard variation in surjective multimodal
data [23.731871165711635]
We consider subjective data, where single datapoints from one modality describe multiple datapoints from another modality.
We theoretically and empirically demonstrate that multimodal VAEs with a mixture of experts posterior can struggle to capture variability in such surjective data.
arXiv Detail & Related papers (2022-04-11T16:22:51Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent
Representations [24.3033562693679]
We introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities.
We demonstrate the utility of DMVAE on a semi-supervised learning task, where one of the modalities contains partial data labels.
Our experiments on several benchmarks indicate the importance of the private-shared disentanglement as well as the hybrid latent representation.
arXiv Detail & Related papers (2020-12-23T23:33:23Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.