Generalized Multimodal ELBO
- URL: http://arxiv.org/abs/2105.02470v1
- Date: Thu, 6 May 2021 07:05:00 GMT
- Title: Generalized Multimodal ELBO
- Authors: Thomas M. Sutter and Imant Daunhawer, Julia E. Vogt
- Abstract summary: Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research.
Existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models.
We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations.
- Score: 11.602089225841631
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multiple data types naturally co-occur when describing real-world phenomena
and learning from them is a long-standing goal in machine learning research.
However, existing self-supervised generative models approximating an ELBO are
not able to fulfill all desired requirements of multimodal models: their
posterior approximation functions lead to a trade-off between the semantic
coherence and the ability to learn the joint data distribution. We propose a
new, generalized ELBO formulation for multimodal data that overcomes these
limitations. The new objective encompasses two previous methods as special
cases and combines their benefits without compromises. In extensive
experiments, we demonstrate the advantage of the proposed method compared to
state-of-the-art models in self-supervised, generative learning tasks.
Related papers
- Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Learning Multimodal Latent Generative Models with Energy-Based Prior [3.6648642834198797]
We propose a novel framework that integrates the latent generative model with the EBM.
This approach results in a more expressive and informative prior, better-capturing of information across multiple modalities.
arXiv Detail & Related papers (2024-09-30T01:38:26Z) - VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition [3.4923338594757674]
Large language models (LLMs) can be used to train a model capable of extracting various types of entities.
In this paper, we utilize the open-sourced LLM LLaMA2 as the backbone model, and design specific instructions to distinguish between different types of entities and datasets.
Our model VANER, trained with a small partition of parameters, significantly outperforms previous LLMs-based models and, for the first time, as a model based on LLM, surpasses the majority of conventional state-of-the-art BioNER systems.
arXiv Detail & Related papers (2024-04-27T09:00:39Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - On the Limitations of Multimodal VAEs [9.449650062296824]
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data.
Despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs.
arXiv Detail & Related papers (2021-10-08T13:28:28Z) - Relating by Contrasting: A Data-efficient Framework for Multimodal
Generative Models [86.9292779620645]
We develop a contrastive framework for generative model learning, allowing us to train the model not just by the commonality between modalities, but by the distinction between "related" and "unrelated" multimodal data.
Under our proposed framework, the generative model can accurately identify related samples from unrelated ones, making it possible to make use of the plentiful unlabeled, unpaired multimodal data.
arXiv Detail & Related papers (2020-07-02T15:08:11Z) - Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence [20.23920009396818]
We propose a novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions.
It simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior.
In extensive experiments, we demonstrate the advantage of the proposed mmJSD model compared to previous work in unsupervised, generative learning tasks.
arXiv Detail & Related papers (2020-06-15T09:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.