Decoupled Multimodal Distilling for Emotion Recognition
- URL: http://arxiv.org/abs/2303.13802v1
- Date: Fri, 24 Mar 2023 04:54:44 GMT
- Title: Decoupled Multimodal Distilling for Emotion Recognition
- Authors: Yong Li, Yuanzhi Wang, Zhen Cui
- Abstract summary: We propose a decoupled multimodal distillation (DMD) approach that facilitates flexible and adaptive crossmodal knowledge distillation.
The representation of each modality is decoupled into two parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression manner.
Experimental results show DMD consistently obtains superior performance than state-of-the-art MER methods.
- Score: 21.685394946415993
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Human multimodal emotion recognition (MER) aims to perceive human emotions
via language, visual and acoustic modalities. Despite the impressive
performance of previous MER approaches, the inherent multimodal heterogeneities
still haunt and the contribution of different modalities varies significantly.
In this work, we mitigate this issue by proposing a decoupled multimodal
distillation (DMD) approach that facilitates flexible and adaptive crossmodal
knowledge distillation, aiming to enhance the discriminative features of each
modality. Specially, the representation of each modality is decoupled into two
parts, i.e., modality-irrelevant/-exclusive spaces, in a self-regression
manner. DMD utilizes a graph distillation unit (GD-Unit) for each decoupled
part so that each GD can be performed in a more specialized and effective
manner. A GD-Unit consists of a dynamic graph where each vertice represents a
modality and each edge indicates a dynamic knowledge distillation. Such GD
paradigm provides a flexible knowledge transfer manner where the distillation
weights can be automatically learned, thus enabling diverse crossmodal
knowledge transfer patterns. Experimental results show DMD consistently obtains
superior performance than state-of-the-art MER methods. Visualization results
show the graph edges in DMD exhibit meaningful distributional patterns w.r.t.
the modality-irrelevant/-exclusive feature spaces. Codes are released at
\url{https://github.com/mdswyz/DMD}.
Related papers
- Multimodal Industrial Anomaly Detection by Crossmodal Reverse Distillation [15.89869857998053]
We propose Crossmodal Reverse Distillation (CRD) based on Multi-branch design to realize Multimodal Industrial AD.
By assigning independent branches to each modality, our method enables finer detection of anomalies within each modality.
Our method achieves state-of-the-art performance in multimodal anomaly detection and localization.
arXiv Detail & Related papers (2024-12-12T05:26:50Z) - GAMED: Knowledge Adaptive Multi-Experts Decoupling for Multimodal Fake News Detection [18.157900272828602]
Multimodal fake news detection often involves modelling heterogeneous data sources, such as vision and language.
This paper develops a significantly novel approach, GAMED, for multimodal modelling.
It focuses on generating distinctive and discriminative features through modal decoupling to enhance cross-modal synergies.
arXiv Detail & Related papers (2024-12-11T19:12:22Z) - GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation [68.63955715643974]
Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
We propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o)
arXiv Detail & Related papers (2024-07-08T01:06:13Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Mutual Distillation Learning For Person Re-Identification [27.350415735863184]
We propose a novel approach, Mutual Distillation Learning For Person Re-identification (termed as MDPR)
Our approach encompasses two branches: a hard content branch to extract local features via a uniform horizontal partitioning strategy and a Soft Content Branch to dynamically distinguish between foreground and background.
Our method achieves an impressive $88.7%/94.4%$ in mAP/Rank-1 on the DukeC-reID dataset, surpassing the current state-of-the-art results.
arXiv Detail & Related papers (2024-01-12T07:49:02Z) - Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition [14.639340916340801]
We propose a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method.
Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces.
Secondly, we build a generator and a discriminator for the three modal features through adversarial representation.
Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information.
arXiv Detail & Related papers (2023-12-28T01:57:26Z) - I$^2$MD: 3D Action Representation Learning with Inter- and Intra-modal
Mutual Distillation [147.2183428328396]
We introduce a general Inter- and Intra-modal Mutual Distillation (I$2$MD) framework.
In I$2$MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy.
arXiv Detail & Related papers (2023-10-24T07:22:17Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.