MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast
and Accurate Inference on Missing Modality Sequences
- URL: http://arxiv.org/abs/2210.12798v1
- Date: Sun, 23 Oct 2022 17:44:56 GMT
- Title: MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast
and Accurate Inference on Missing Modality Sequences
- Authors: Wei Han, Hui Chen, Min-Yen Kan, Soujanya Poria
- Abstract summary: We present a novel approach named MM-Align to address the missing-modality inference problem.
MM-Align learns to capture and imitate the alignment dynamics between modality sequences.
Our method can perform more accurate and faster inference and relieve overfitting under various missing conditions.
- Score: 32.42505193560884
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Existing multimodal tasks mostly target at the complete input modality
setting, i.e., each modality is either complete or completely missing in both
training and test sets. However, the randomly missing situations have still
been underexplored. In this paper, we present a novel approach named MM-Align
to address the missing-modality inference problem. Concretely, we propose 1) an
alignment dynamics learning module based on the theory of optimal transport
(OT) for indirect missing data imputation; 2) a denoising training algorithm to
simultaneously enhance the imputation results and backbone network performance.
Compared with previous methods which devote to reconstructing the missing
inputs, MM-Align learns to capture and imitate the alignment dynamics between
modality sequences. Results of comprehensive experiments on three datasets
covering two multimodal tasks empirically demonstrate that our method can
perform more accurate and faster inference and relieve overfitting under
various missing conditions.
Related papers
- Multimodal Fusion Balancing Through Game-Theoretic Regularization [3.2065271838977627]
We show that current balancing methods struggle to train multimodal models that surpass even simple baselines, such as ensembles.
This raises the question: how can we ensure that all modalities in multimodal training are sufficiently trained, and that learning from new modalities consistently improves performance?
This paper proposes the Multimodal Competition Regularizer (MCR), a new loss component inspired by mutual information (MI) decomposition.
arXiv Detail & Related papers (2024-11-11T19:53:05Z) - On-the-fly Modulation for Balanced Multimodal Learning [53.616094855778954]
Multimodal learning is expected to boost model performance by integrating information from different modalities.
The widely-used joint training strategy leads to imbalanced and under-optimized uni-modal representations.
We propose On-the-fly Prediction Modulation (OPM) and On-the-fly Gradient Modulation (OGM) strategies to modulate the optimization of each modality.
arXiv Detail & Related papers (2024-10-15T13:15:50Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment [0.0]
"Harmonized Transfer Learning and Modality alignment (HarMA)" is a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment.
HarMA achieves state-of-the-art performance in two popular multimodal retrieval tasks in the field of remote sensing.
arXiv Detail & Related papers (2024-04-28T17:20:08Z) - Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity [9.811378971225727]
This paper extends the current research into missing modalities to the low-data regime.
It is often expensive to get full-modality data and sufficient annotated training samples.
We propose to use retrieval-augmented in-context learning to address these two crucial issues.
arXiv Detail & Related papers (2024-03-14T14:19:48Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Understanding and Constructing Latent Modality Structures in Multi-modal
Representation Learning [53.68371566336254]
We argue that the key to better performance lies in meaningful latent modality structures instead of perfect modality alignment.
Specifically, we design 1) a deep feature separation loss for intra-modality regularization; 2) a Brownian-bridge loss for inter-modality regularization; and 3) a geometric consistency loss for both intra- and inter-modality regularization.
arXiv Detail & Related papers (2023-03-10T14:38:49Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.