Interpretable Tensor Fusion
- URL: http://arxiv.org/abs/2405.04671v1
- Date: Tue, 7 May 2024 21:05:50 GMT
- Title: Interpretable Tensor Fusion
- Authors: Saurabh Varshneya, Antoine Ledent, Philipp Liznerski, Andriy Balinskyy, Purvanshi Mehta, Waleed Mustafa, Marius Kloft,
- Abstract summary: We introduce interpretable tensor fusion (InTense), a method for training neural networks to simultaneously learn multimodal data representations.
InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations.
Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.
- Score: 26.314148163750257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of diverse data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.
Related papers
- Flexible inference in heterogeneous and attributed multilayer networks [21.349513661012498]
We develop a probabilistic generative model to perform inference in multilayer networks with arbitrary types of information.
We demonstrate its ability to unveil a variety of patterns in a social support network among villagers in rural India.
arXiv Detail & Related papers (2024-05-31T15:21:59Z) - What is different between these datasets? [23.271594219577185]
Two comparable datasets in the same domain may have different distributions.
We propose a suite of interpretable methods (toolbox) for comparing two datasets.
Our methods not only outperform comparable and related approaches in terms of explanation quality and correctness, but also provide actionable, complementary insights to understand and mitigate dataset differences effectively.
arXiv Detail & Related papers (2024-03-08T19:52:39Z) - MultiFIX: An XAI-friendly feature inducing approach to building models
from multimodal data [0.0]
MultiFIX is a new interpretability-focused multimodal data fusion pipeline.
An end-to-end deep learning architecture is used to train a predictive model.
We apply MultiFIX to a publicly available dataset for the detection of malignant skin lesions.
arXiv Detail & Related papers (2024-02-19T14:45:46Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Enhancing ensemble learning and transfer learning in multimodal data
analysis by adaptive dimensionality reduction [10.646114896709717]
In multimodal data analysis, not all observations would show the same level of reliability or information quality.
We propose an adaptive approach for dimensionality reduction to overcome this issue.
We test our approach on multimodal datasets acquired in diverse research fields.
arXiv Detail & Related papers (2021-05-08T11:53:12Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z) - Multimodal Routing: Improving Local and Global Interpretability of
Multimodal Language Analysis [103.69656907534456]
Recent multimodal learning with strong performances on human-centric tasks are often black-box.
We propose Multimodal Routing, which adjusts weights between input modalities and output representations differently for each input sample.
arXiv Detail & Related papers (2020-04-29T13:42:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.