Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent
Representations
- URL: http://arxiv.org/abs/2012.13024v1
- Date: Wed, 23 Dec 2020 23:33:23 GMT
- Title: Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent
Representations
- Authors: Mihee Lee, Vladimir Pavlovic
- Abstract summary: We introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities.
We demonstrate the utility of DMVAE on a semi-supervised learning task, where one of the modalities contains partial data labels.
Our experiments on several benchmarks indicate the importance of the private-shared disentanglement as well as the hybrid latent representation.
- Score: 24.3033562693679
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal generative models represent an important family of deep models,
whose goal is to facilitate representation learning on data with multiple views
or modalities. However, current deep multi-modal models focus on the inference
of shared representations, while neglecting the important private aspects of
data within individual modalities. In this paper, we introduce a disentangled
multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE
strategy to separate the private and shared latent spaces of multiple
modalities. We specifically consider the instance where the latent factor may
be of both continuous and discrete nature, leading to the family of general
hybrid DMVAE models. We demonstrate the utility of DMVAE on a semi-supervised
learning task, where one of the modalities contains partial data labels, both
relevant and irrelevant to the other modality. Our experiments on several
benchmarks indicate the importance of the private-shared disentanglement as
well as the hybrid latent representation.
Related papers
- U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Disentangling shared and private latent factors in multimodal
Variational Autoencoders [6.680930089714339]
Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private.
We demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation.
Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.
arXiv Detail & Related papers (2024-03-10T23:11:05Z) - Mitigating Biases with Diverse Ensembles and Diffusion Models [99.6100669122048]
We propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z) - Generalized Multimodal ELBO [11.602089225841631]
Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research.
Existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models.
We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations.
arXiv Detail & Related papers (2021-05-06T07:05:00Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.