Deep Metric Loss for Multimodal Learning
- URL: http://arxiv.org/abs/2308.10486v1
- Date: Mon, 21 Aug 2023 06:04:30 GMT
- Title: Deep Metric Loss for Multimodal Learning
- Authors: Sehwan Moon and Hyunju Lee
- Abstract summary: We introduce a novel textMultiModal loss paradigm for multimodal learning.
textMultiModal loss can prevent inefficient learning caused by overfitting and efficiently optimize multimodal models.
Our loss is empirically shown to improve the performance of recent models.
- Score: 3.8979646385036175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal learning often outperforms its unimodal counterparts by exploiting
unimodal contributions and cross-modal interactions. However, focusing only on
integrating multimodal features into a unified comprehensive representation
overlooks the unimodal characteristics. In real data, the contributions of
modalities can vary from instance to instance, and they often reinforce or
conflict with each other. In this study, we introduce a novel \text{MultiModal}
loss paradigm for multimodal learning, which subgroups instances according to
their unimodal contributions. \text{MultiModal} loss can prevent inefficient
learning caused by overfitting and efficiently optimize multimodal models. On
synthetic data, \text{MultiModal} loss demonstrates improved classification
performance by subgrouping difficult instances within certain modalities. On
four real multimodal datasets, our loss is empirically shown to improve the
performance of recent models. Ablation studies verify the effectiveness of our
loss. Additionally, we show that our loss generates a reliable prediction score
for each modality, which is essential for subgrouping. Our \text{MultiModal}
loss is a novel loss function to subgroup instances according to the
contribution of modalities in multimodal learning and is applicable to a
variety of multimodal models with unimodal decisions. Our code is available at
https://github.com/SehwanMoon/MultiModalLoss.
Related papers
- Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach [29.428067329993173]
We propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities.
It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance.
Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-08-14T10:32:16Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance [10.580712937465032]
We identify the previously ignored gradient conflict between multimodal and unimodal learning objectives.
We propose MMPareto algorithm, which could ensure a final gradient with direction common to all learning objectives.
Our method is also expected to facilitate multi-task cases with a clear discrepancy in task difficulty.
arXiv Detail & Related papers (2024-05-28T01:19:13Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Multimodal Representation Learning by Alternating Unimodal Adaptation [73.15829571740866]
We propose MLA (Multimodal Learning with Alternating Unimodal Adaptation) to overcome challenges where some modalities appear more dominant than others during multimodal learning.
MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process.
It captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities.
Experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities.
arXiv Detail & Related papers (2023-11-17T18:57:40Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Robust Multimodal Learning with Missing Modalities via Parameter-Efficient Adaptation [16.17270247327955]
We propose a simple and parameter-efficient adaptation procedure for pretrained multimodal networks.
We demonstrate that such adaptation can partially bridge performance drop due to missing modalities.
Our proposed method demonstrates versatility across various tasks and datasets, and outperforms existing methods for robust multimodal learning with missing modalities.
arXiv Detail & Related papers (2023-10-06T03:04:21Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - A Study of Syntactic Multi-Modality in Non-Autoregressive Machine
Translation [144.55713938260828]
It is difficult for non-autoregressive translation models to capture the multi-modal distribution of target translations.
We decompose it into short- and long-range syntactic multi-modalities and evaluate several recent NAT algorithms with advanced loss functions.
We design a new loss function to better handle the complicated syntactic multi-modality in real-world datasets.
arXiv Detail & Related papers (2022-07-09T06:48:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.