Related papers: Improving Multimodal Joint Variational Autoencoders through Normalizing Flows and Correlation Analysis

Improving Multimodal Joint Variational Autoencoders through Normalizing Flows and Correlation Analysis

URL: http://arxiv.org/abs/2305.11832v1
Date: Fri, 19 May 2023 17:15:34 GMT
Title: Improving Multimodal Joint Variational Autoencoders through Normalizing Flows and Correlation Analysis
Authors: Agathe Senellart, Cl\'ement Chadebec, St\'ephanie Allassonni\`ere
Abstract summary: The unimodal posteriors are conditioned on the Deep Canonical Correlation Analysis embeddings. We also use Normalizing Flows to enrich the unimodal posteriors and achieve more diverse data generation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new multimodal variational autoencoder that enables to generate from the joint distribution and conditionally to any number of complex modalities. The unimodal posteriors are conditioned on the Deep Canonical Correlation Analysis embeddings which preserve the shared information across modalities leading to more coherent cross-modal generations. Furthermore, we use Normalizing Flows to enrich the unimodal posteriors and achieve more diverse data generation. Finally, we propose to use a Product of Experts for inferring one modality from several others which makes the model scalable to any number of modalities. We demonstrate that our method improves likelihood estimates, diversity of the generations and in particular coherence metrics in the conditional generations on several datasets.

Related papers

Bridging the inference gap in Mutimodal Variational Autoencoders [6.246098300155483]
Multimodal Variational Autoencoders offer versatile and scalable methods for generating unobserved modalities from observed ones. Recent models using mixturesof-experts aggregation suffer from theoretically grounded limitations that restrict their generation quality on complex datasets. We propose a novel interpretable model able to learn both joint and conditional distributions without introducing mixture aggregation.
arXiv Detail & Related papers (2025-02-06T10:43:55Z)
Learning Multimodal Latent Generative Models with Energy-Based Prior [3.6648642834198797]
We propose a novel framework that integrates the latent generative model with the EBM. This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.
arXiv Detail & Related papers (2024-09-30T01:38:26Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives [5.549794481031468]
Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. In this work, we consider a variational objective that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that avoid the inductive biases in PoE or MoE approaches.
arXiv Detail & Related papers (2023-09-01T10:32:21Z)
Score-Based Multimodal Autoencoders [4.594159253008448]
Multimodal Variational Autoencoders (VAEs) facilitate the construction of a tractable posterior within the latent space, given multiple modalities. In this study, we explore an alternative approach to enhance the generative performance of multimodal VAEs by jointly modeling the latent space of unimodal VAEs. Our model combines the superior generative quality of unimodal VAEs with coherent integration across different modalities.
arXiv Detail & Related papers (2023-05-25T04:43:47Z)
Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z)
Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks. We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers. We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z)
Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Learning more expressive joint distributions in multimodal variational methods [0.17188280334580194]
We introduce a method that improves the representational capacity of multimodal variational methods using normalizing flows. We demonstrate that the model improves on state-of-the-art multimodal methods based on variational inference on various computer vision tasks. We also show that learning more powerful approximate joint distributions improves the quality of the generated samples.
arXiv Detail & Related papers (2020-09-08T11:45:27Z)
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data. Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z)
Towards Multimodal Response Generation with Exemplar Augmentation and Curriculum Optimization [73.45742420178196]
We propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization. Our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
arXiv Detail & Related papers (2020-04-26T16:29:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.