Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition
- URL: http://arxiv.org/abs/2001.11657v1
- Date: Fri, 31 Jan 2020 04:51:55 GMT
- Title: Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition
- Authors: Sijie Song, Jiaying Liu, Yanghao Li, Zongming Guo
- Abstract summary: We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
- Score: 77.24983234113957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the prevalence of RGB-D cameras, multi-modal video data have become more
available for human action recognition. One main challenge for this task lies
in how to effectively leverage their complementary information. In this work,
we propose a Modality Compensation Network (MCN) to explore the relationships
of different modalities, and boost the representations for human action
recognition. We regard RGB/optical flow videos as source modalities, skeletons
as auxiliary modality. Our goal is to extract more discriminative features from
source modalities, with the help of auxiliary modality. Built on deep
Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) networks,
our model bridges data from source and auxiliary modalities by a modality
adaptation block to achieve adaptive representation learning, that the network
learns to compensate for the loss of skeletons at test time and even at
training time. We explore multiple adaptation schemes to narrow the distance
between source and auxiliary modal distributions from different levels,
according to the alignment of source and auxiliary data in training. In
addition, skeletons are only required in the training phase. Our model is able
to improve the recognition performance with source data when testing.
Experimental results reveal that MCN outperforms state-of-the-art approaches on
four widely-used action recognition benchmarks.
Related papers
- Robust Divergence Learning for Missing-Modality Segmentation [6.144772447916824]
Multimodal Magnetic Resonance Imaging (MRI) provides essential complementary information for analyzing brain tumor subregions.
While methods using four common MRI modalities for automatic segmentation have shown success, they often face challenges with missing modalities due to image quality issues, inconsistent protocols, allergic reactions, or cost factors.
A novel single-modality parallel processing network framework based on H"older divergence and mutual information is introduced.
arXiv Detail & Related papers (2024-11-13T03:03:30Z) - Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter [32.64004722423187]
We show how to improve the robustness of RGB-skeleton action recognition models.
We propose the formatwordAttention-based formatwordModality formatwordReweighter (formatwordAMR)
Our AMR is plug-and-play, allowing easy integration with multimodal models.
arXiv Detail & Related papers (2024-07-29T13:15:51Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings [1.5703963908242198]
This paper introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation.
To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data.
arXiv Detail & Related papers (2024-04-03T13:35:51Z) - See Through Their Minds: Learning Transferable Neural Representation from Cross-Subject fMRI [32.40827290083577]
Deciphering visual content from functional Magnetic Resonance Imaging (fMRI) helps illuminate the human vision system.
Previous approaches primarily employ subject-specific models, sensitive to training sample size.
We propose shallow subject-specific adapters to map cross-subject fMRI data into unified representations.
During training, we leverage both visual and textual supervision for multi-modal brain decoding.
arXiv Detail & Related papers (2024-03-11T01:18:49Z) - Adaptive Parameterization of Deep Learning Models for Federated Learning [85.82002651944254]
Federated Learning offers a way to train deep neural networks in a distributed fashion.
It incurs a communication overhead as the model parameters or gradients need to be exchanged regularly during training.
In this paper, we propose to utilise parallel Adapters for Federated Learning.
arXiv Detail & Related papers (2023-02-06T17:30:33Z) - Joint-bone Fusion Graph Convolutional Network for Semi-supervised
Skeleton Action Recognition [65.78703941973183]
We propose a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder.
Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream.
The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data.
arXiv Detail & Related papers (2022-02-08T16:03:15Z) - Deep Adaptive Inference Networks for Single Image Super-Resolution [72.7304455761067]
Single image super-resolution (SISR) has witnessed tremendous progress in recent years owing to the deployment of deep convolutional neural networks (CNNs)
In this paper, we take a step forward to address this issue by leveraging the adaptive inference networks for deep SISR (AdaDSR)
Our AdaDSR involves an SISR model as backbone and a lightweight adapter module which takes image features and resource constraint as input and predicts a map of local network depth.
arXiv Detail & Related papers (2020-04-08T10:08:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.