Towards Good Practices for Missing Modality Robust Action Recognition
- URL: http://arxiv.org/abs/2211.13916v2
- Date: Thu, 30 Mar 2023 06:35:24 GMT
- Title: Towards Good Practices for Missing Modality Robust Action Recognition
- Authors: Sangmin Woo, Sumin Lee, Yeonju Park, Muhammad Adi Nugroho, Changick
Kim
- Abstract summary: This paper seeks a set of good practices for multi-modal action recognition.
We study how to effectively regularize the model during training.
Second, we investigate on fusion methods for robustness to missing modalities.
Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding.
- Score: 20.26021126604409
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Standard multi-modal models assume the use of the same modalities in training
and inference stages. However, in practice, the environment in which
multi-modal models operate may not satisfy such assumption. As such, their
performances degrade drastically if any modality is missing in the inference
stage. We ask: how can we train a model that is robust to missing modalities?
This paper seeks a set of good practices for multi-modal action recognition,
with a particular interest in circumstances where some modalities are not
available at an inference time. First, we study how to effectively regularize
the model during training (e.g., data augmentation). Second, we investigate on
fusion methods for robustness to missing modalities: we find that
transformer-based fusion shows better robustness for missing modality than
summation or concatenation. Third, we propose a simple modular network,
ActionMAE, which learns missing modality predictive coding by randomly dropping
modality features and tries to reconstruct them with the remaining modality
features. Coupling these good practices, we build a model that is not only
effective in multi-modal action recognition but also robust to modality
missing. Our model achieves the state-of-the-arts on multiple benchmarks and
maintains competitive performances even in missing modality scenarios. Codes
are available at https://github.com/sangminwoo/ActionMAE.
Related papers
- Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - Dealing with All-stage Missing Modality: Towards A Universal Model with Robust Reconstruction and Personalization [14.606035444283984]
Current approaches focus on developing models that handle modality-incomplete inputs during inference.
We propose a robust universal model with modality reconstruction and model personalization.
Our method has been extensively validated on two brain tumor segmentation benchmarks.
arXiv Detail & Related papers (2024-06-04T06:07:24Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Multimodal Federated Learning with Missing Modality via Prototype Mask
and Contrast [23.936677199734213]
In this paper, we introduce a prototype library into the FedAvg-based Federated Learning framework.
The proposed method utilizes prototypes as masks representing missing modalities to formulate a task-calibrated training loss and a model-agnostic uni-modality inference strategy.
Compared to the baselines, our method improved inference accuracy by 3.7% with 50% modality missing during training and by 23.8% during uni-modality inference.
arXiv Detail & Related papers (2023-12-21T00:55:12Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - What Makes for Robust Multi-Modal Models in the Face of Missing
Modalities? [35.19295402483624]
We model the scenarios of multi-modal models encountering missing modalities from an information-theoretic perspective.
We introduce Uni-Modal Ensemble with Missing Modality Adaptation (UME-MMA)
UME-MMA employs uni-modal pre-trained weights for the multi-modal model to enhance feature extraction and utilizes missing modality data augmentation techniques to better adapt to situations with missing modalities.
arXiv Detail & Related papers (2023-10-10T07:47:57Z) - Improving Discriminative Multi-Modal Learning with Large-Scale
Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning.
We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Mutual Modality Learning for Video Action Classification [74.83718206963579]
We show how to embed multi-modality into a single model for video action classification.
We achieve state-of-the-art results in the Something-Something-v2 benchmark.
arXiv Detail & Related papers (2020-11-04T21:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.