Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent
Representations
- URL: http://arxiv.org/abs/2012.13024v1
- Date: Wed, 23 Dec 2020 23:33:23 GMT
- Title: Private-Shared Disentangled Multimodal VAE for Learning of Hybrid Latent
Representations
- Authors: Mihee Lee, Vladimir Pavlovic
- Abstract summary: We introduce a disentangled multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE strategy to separate the private and shared latent spaces of multiple modalities.
We demonstrate the utility of DMVAE on a semi-supervised learning task, where one of the modalities contains partial data labels.
Our experiments on several benchmarks indicate the importance of the private-shared disentanglement as well as the hybrid latent representation.
- Score: 24.3033562693679
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Multi-modal generative models represent an important family of deep models,
whose goal is to facilitate representation learning on data with multiple views
or modalities. However, current deep multi-modal models focus on the inference
of shared representations, while neglecting the important private aspects of
data within individual modalities. In this paper, we introduce a disentangled
multi-modal variational autoencoder (DMVAE) that utilizes disentangled VAE
strategy to separate the private and shared latent spaces of multiple
modalities. We specifically consider the instance where the latent factor may
be of both continuous and discrete nature, leading to the family of general
hybrid DMVAE models. We demonstrate the utility of DMVAE on a semi-supervised
learning task, where one of the modalities contains partial data labels, both
relevant and irrelevant to the other modality. Our experiments on several
benchmarks indicate the importance of the private-shared disentanglement as
well as the hybrid latent representation.
Related papers
- Learning Multimodal Latent Generative Models with Energy-Based Prior [3.6648642834198797]
We propose a novel framework that integrates the latent generative model with the EBM.
This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.
arXiv Detail & Related papers (2024-09-30T01:38:26Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research.
In this work, we consider a variational objective that can tightly approximate the data log-likelihood.
We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches.
arXiv Detail & Related papers (2023-09-01T10:32:21Z) - Score-Based Multimodal Autoencoder [0.9208007322096533]
Multimodal Variational Autoencoders (VAEs) facilitate the construction of a tractable posterior within the latent space given multiple modalities.
Previous studies have shown that as the number of modalities increases, the generative quality of each modality declines.
In this study, we explore an alternative approach to enhance the generative performance of multimodal VAEs by jointly modeling the latent space of independently trained unimodal VAEs.
arXiv Detail & Related papers (2023-05-25T04:43:47Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.