SHARCS: Shared Concept Space for Explainable Multimodal Learning
- URL: http://arxiv.org/abs/2307.00316v1
- Date: Sat, 1 Jul 2023 12:05:20 GMT
- Title: SHARCS: Shared Concept Space for Explainable Multimodal Learning
- Authors: Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro
Li\`o, Nikola Simidjievski
- Abstract summary: We introduce SHARCS -- a novel concept-based approach for explainable multimodal learning.
SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold.
We show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios.
- Score: 3.899855581265356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning is an essential paradigm for addressing complex
real-world problems, where individual data modalities are typically
insufficient to accurately solve a given modelling task. While various deep
learning approaches have successfully addressed these challenges, their
reasoning process is often opaque; limiting the capabilities for a principled
explainable cross-modal analysis and any domain-expert intervention. In this
paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based
approach for explainable multimodal learning. SHARCS learns and maps
interpretable concepts from different heterogeneous modalities into a single
unified concept-manifold, which leads to an intuitive projection of
semantically similar cross-modal concepts. We demonstrate that such an approach
can lead to inherently explainable task predictions while also improving
downstream predictive performance. Moreover, we show that SHARCS can operate
and significantly outperform other approaches in practically significant
scenarios, such as retrieval of missing modalities and cross-modal
explanations. Our approach is model-agnostic and easily applicable to different
types (and number) of modalities, thus advancing the development of effective,
interpretable, and trustworthy multimodal approaches.
Related papers
- On the Comparison between Multi-modal and Single-modal Contrastive Learning [50.74988548106031]
We introduce a theoretical foundation for understanding the differences between multi-modal and single-modal contrastive learning.
We identify the critical factor, which is the signal-to-noise ratio (SNR), that impacts the generalizability in downstream tasks of both multi-modal and single-modal contrastive learning.
Our analysis provides a unified framework that can characterize the optimization and generalization of both single-modal and multi-modal contrastive learning.
arXiv Detail & Related papers (2024-11-05T06:21:17Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances from multiple modalities when only a few labeled examples are available.
We propose a Generative Transfer Learning framework consisting of two stages: the first involves training on abundant unimodal data, and the second focuses on transfer learning to adapt to novel data.
Our finds demonstrate that GTL has superior performance compared to state-of-the-art methods across four distinct multi-modal datasets.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal
Prediction for Multimodal Sentiment Analysis [19.07020276666615]
We propose a novel framework named MultiModal Contrastive Learning (MMCL) for multimodal representation to capture intra- and inter-modality dynamics simultaneously.
We also design two contrastive learning tasks, instance- and sentiment-based contrastive learning, to promote the process of prediction and learn more interactive information related to sentiment.
arXiv Detail & Related papers (2022-10-26T08:24:15Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.