Generalized Product-of-Experts for Learning Multimodal Representations
in Noisy Environments
- URL: http://arxiv.org/abs/2211.03587v1
- Date: Mon, 7 Nov 2022 14:27:38 GMT
- Title: Generalized Product-of-Experts for Learning Multimodal Representations
in Noisy Environments
- Authors: Abhinav Joshi and Naman Gupta and Jinang Shah and Binod Bhattarai and
Ashutosh Modi and Danail Stoyanov
- Abstract summary: We propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique.
In the proposed method, we train a separate network for each modality to assess the credibility of information coming from that modality.
We attain state-of-the-art performance on two challenging benchmarks: multimodal 3D hand-pose estimation and multimodal surgical video segmentation.
- Score: 18.14974353615421
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A real-world application or setting involves interaction between different
modalities (e.g., video, speech, text). In order to process the multimodal
information automatically and use it for an end application, Multimodal
Representation Learning (MRL) has emerged as an active area of research in
recent times. MRL involves learning reliable and robust representations of
information from heterogeneous sources and fusing them. However, in practice,
the data acquired from different sources are typically noisy. In some extreme
cases, a noise of large magnitude can completely alter the semantics of the
data leading to inconsistencies in the parallel multimodal data. In this paper,
we propose a novel method for multimodal representation learning in a noisy
environment via the generalized product of experts technique. In the proposed
method, we train a separate network for each modality to assess the credibility
of information coming from that modality, and subsequently, the contribution
from each modality is dynamically varied while estimating the joint
distribution. We evaluate our method on two challenging benchmarks from two
diverse domains: multimodal 3D hand-pose estimation and multimodal surgical
video segmentation. We attain state-of-the-art performance on both benchmarks.
Our extensive quantitative and qualitative evaluations show the advantages of
our method compared to previous approaches.
Related papers
- U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method [3.0903319879656084]
This paper introduces an innovative approach to feature alignment that revolutionizes the fusion of multimodal information.
Our method employs a novel iterative process of telescopic displacement and expansion of feature representations across different modalities, culminating in a coherent unified representation within a shared feature space.
arXiv Detail & Related papers (2023-06-29T13:49:06Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Generalizing Multimodal Variational Methods to Sets [35.69942798534849]
This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space.
By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization.
arXiv Detail & Related papers (2022-12-19T23:50:19Z) - Few-shot Multimodal Sentiment Analysis based on Multimodal Probabilistic
Fusion Prompts [30.15646658460899]
Multimodal sentiment analysis has gained significant attention due to the proliferation of multimodal content on social media.
Existing studies in this area rely heavily on large-scale supervised data, which is time-consuming and labor-intensive to collect.
We propose a novel method called Multimodal Probabilistic Fusion Prompts (MultiPoint) that leverages diverse cues from different modalities for multimodal sentiment detection in the few-shot scenario.
arXiv Detail & Related papers (2022-11-12T08:10:35Z) - Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal
and Multimodal Representations [27.855467591358018]
We introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation.
We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints.
Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition.
arXiv Detail & Related papers (2022-10-31T16:14:18Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - Noise Estimation Using Density Estimation for Self-Supervised Multimodal
Learning [10.151012770913624]
We show that noise estimation for multimodal data can be reduced to a multimodal density estimation task.
We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance.
arXiv Detail & Related papers (2020-03-06T13:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.