Related papers: MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

URL: http://arxiv.org/abs/2203.02385v1
Date: Fri, 4 Mar 2022 15:42:53 GMT
Title: MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations
Authors: Dou Hu, Xiaolong Hou, Lingwei Wei, Lianxin Jiang, Yang Mo
Abstract summary: multimodal Emotion Recognition in Conversations (ERC) has considerable prospects for developing empathetic machines. Recent graph-based fusion methods aggregate multimodal information by exploring unimodal and cross-modal interactions in a graph. We propose a novel Multimodal Dynamic Fusion Network (MM-DFN) to recognize emotions by fully understanding multimodal conversational context.
Score: 5.5997926295092295
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Emotion Recognition in Conversations (ERC) has considerable prospects for developing empathetic machines. For multimodal ERC, it is vital to understand context and fuse modality information in conversations. Recent graph-based fusion methods generally aggregate multimodal information by exploring unimodal and cross-modal interactions in a graph. However, they accumulate redundant information at each layer, limiting the context understanding between modalities. In this paper, we propose a novel Multimodal Dynamic Fusion Network (MM-DFN) to recognize emotions by fully understanding multimodal conversational context. Specifically, we design a new graph-based dynamic fusion module to fuse multimodal contextual features in a conversation. The module reduces redundancy and enhances complementarity between modalities by capturing the dynamics of contextual information in different semantic spaces. Extensive experiments on two public benchmark datasets demonstrate the effectiveness and superiority of MM-DFN.

Related papers

Heterogeneous bimodal attention fusion for speech emotion recognition [14.24203981584309]
We propose a novel framework called Heterogeneous Bimodal Attention Fusion (HBAF) for multi-level multi-modal interaction in conversational emotion recognition. The uni-modal representation module incorporates contextual content into low-level audio representations to bridge the heterogeneous multi-modal gap. The multi-modal fusion module uses dynamic bimodal attention and a dynamic gating mechanism to filter incorrect cross-modal relationships.
arXiv Detail & Related papers (2025-03-09T02:50:49Z)
Effective Context Modeling Framework for Emotion Recognition in Conversations [2.7175580940471913]
Emotion Recognition in Conversations (ERC) facilitates a deeper understanding of the emotions conveyed by speakers in each utterance within a conversation. Recent Graph Neural Networks (GNNs) have demonstrated their strengths in capturing data relationships. We propose ConxGNN, a novel GNN-based framework designed to capture contextual information in conversations.
arXiv Detail & Related papers (2024-12-21T02:22:06Z)
Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation [12.455034591553506]
Multimodal Emotion Recognition in Conversation (MERC) can be applied to public opinion monitoring, intelligent dialogue robots, and other fields. Previous work ignored the inter-modal alignment process and the intra-modal noise information before multimodal fusion. We have developed a novel approach called Masked Graph Learning with Recursive Alignment (MGLRA) to tackle this problem.
arXiv Detail & Related papers (2024-07-23T02:23:51Z)
AIMDiT: Modality Augmentation and Interaction via Multimodal Dimension Transformation for Emotion Recognition in Conversations [57.99479708224221]
We propose a novel framework called AIMDiT to solve the problem of multimodal fusion of deep features. Experiments conducted using our AIMDiT framework on the public benchmark dataset MELD reveal 2.34% and 2.87% improvements in terms of the Acc-7 and w-F1 metrics.
arXiv Detail & Related papers (2024-04-12T11:31:18Z)
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts [92.76662894585809]
We introduce an approach to enhance multimodal models, which we call Multimodal Mixtures of Experts (MMoE) MMoE is able to be applied to various types of models to gain improvement.
arXiv Detail & Related papers (2023-11-16T05:31:21Z)
Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation [9.817888267356716]
multimodal Emotion Recognition in Conversation (ERC) faces two problems. Deep emotion cues extraction was performed on modalities with strong representation ability. Feature filters were designed as multimodal prompt information for modalities with weak representation ability. MPT embeds multimodal fusion information into each attention layer of the Transformer.
arXiv Detail & Related papers (2023-10-04T13:54:46Z)
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition [81.2011058113579]
We argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. We propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism ( CRM) for multimodal and context integration. Our system achieves new state-of-the-art performance consistently.
arXiv Detail & Related papers (2023-08-08T18:11:27Z)
Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process. Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z)
Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations. Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation. Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z)
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation [32.15124603618625]
We propose a new model based on multimodal fused graph convolutional network, MMGCN, in this work. MMGCN can not only make use of multimodal dependencies effectively, but also leverage speaker information to model inter-speaker and intra-speaker dependency. We evaluate our proposed model on two public benchmark datasets, IEMOCAP and MELD, and the results prove the effectiveness of MMGCN.
arXiv Detail & Related papers (2021-07-14T15:37:02Z)
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network. A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features. The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z)
A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT. We first represent the input sentence and image using a unified multi-modal graph. We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.