EffMulti: Efficiently Modeling Complex Multimodal Interactions for
Emotion Analysis
- URL: http://arxiv.org/abs/2212.08661v1
- Date: Fri, 16 Dec 2022 03:05:55 GMT
- Title: EffMulti: Efficiently Modeling Complex Multimodal Interactions for
Emotion Analysis
- Authors: Feng Qiu, Chengyang Xie, Yu Ding, Wanzeng Kong
- Abstract summary: We design three kinds of latent representations to refine the emotion analysis process.
A modality-semantic hierarchical fusion is proposed to reasonably incorporate these representations into a comprehensive interaction representation.
The experimental results demonstrate that our EffMulti outperforms the state-of-the-art methods.
- Score: 8.941102352671198
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are skilled in reading the interlocutor's emotion from multimodal
signals, including spoken words, simultaneous speech, and facial expressions.
It is still a challenge to effectively decode emotions from the complex
interactions of multimodal signals. In this paper, we design three kinds of
multimodal latent representations to refine the emotion analysis process and
capture complex multimodal interactions from different views, including a
intact three-modal integrating representation, a modality-shared
representation, and three modality-individual representations. Then, a
modality-semantic hierarchical fusion is proposed to reasonably incorporate
these representations into a comprehensive interaction representation. The
experimental results demonstrate that our EffMulti outperforms the
state-of-the-art methods. The compelling performance benefits from its
well-designed framework with ease of implementation, lower computing
complexity, and less trainable parameters.
Related papers
- DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout.
DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z) - AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features.
Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Joyful: Joint Modality Fusion and Graph Contrastive Learning for
Multimodal Emotion Recognition [18.571931295274975]
Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities.
Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue.
We propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful)
arXiv Detail & Related papers (2023-11-18T08:21:42Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Multimodal Prompt Transformer with Hybrid Contrastive Learning for
Emotion Recognition in Conversation [9.817888267356716]
multimodal Emotion Recognition in Conversation (ERC) faces two problems.
Deep emotion cues extraction was performed on modalities with strong representation ability.
Feature filters were designed as multimodal prompt information for modalities with weak representation ability.
MPT embeds multimodal fusion information into each attention layer of the Transformer.
arXiv Detail & Related papers (2023-10-04T13:54:46Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - InterMulti:Multi-view Multimodal Interactions with Text-dominated
Hierarchical High-order Fusion for Emotion Analysis [10.048903012988882]
We propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views.
Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations.
THHF module reasonably integrates the above three kinds of representations into a comprehensive multimodal interaction representation.
arXiv Detail & Related papers (2022-12-20T07:02:32Z) - An Efficient End-to-End Transformer with Progressive Tri-modal Attention
for Multi-modal Emotion Recognition [27.96711773593048]
We propose the multi-modal end-to-end transformer (ME2ET), which can effectively model the tri-modal features interaction.
At the low-level, we propose the progressive tri-modal attention, which can model the tri-modal feature interactions by adopting a two-pass strategy.
At the high-level, we introduce the tri-modal feature fusion layer to explicitly aggregate the semantic representations of three modalities.
arXiv Detail & Related papers (2022-09-20T14:51:38Z) - Multi-scale Cooperative Multimodal Transformers for Multimodal Sentiment
Analysis in Videos [58.93586436289648]
We propose a multi-scale cooperative multimodal transformer (MCMulT) architecture for multimodal sentiment analysis.
Our model outperforms existing approaches on unaligned multimodal sequences and has strong performance on aligned multimodal sequences.
arXiv Detail & Related papers (2022-06-16T07:47:57Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.