Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
- URL: http://arxiv.org/abs/2106.11473v1
- Date: Tue, 22 Jun 2021 01:32:41 GMT
- Title: Sequential Late Fusion Technique for Multi-modal Sentiment Analysis
- Authors: Debapriya Banerjee, Fotios Lygerakis, Fillia Makedon
- Abstract summary: We use text, audio and visual modalities from MOSI dataset.
We propose a novel fusion technique using a multi-head attention LSTM network.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal sentiment analysis plays an important role for providing better
interactive experiences to users. Each modality in multi-modal data can provide
different viewpoints or reveal unique aspects of a user's emotional state. In
this work, we use text, audio and visual modalities from MOSI dataset and we
propose a novel fusion technique using a multi-head attention LSTM network.
Finally, we perform a classification task and evaluate its performance.
Related papers
- Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Multimodal Prompt Transformer with Hybrid Contrastive Learning for
Emotion Recognition in Conversation [9.817888267356716]
multimodal Emotion Recognition in Conversation (ERC) faces two problems.
Deep emotion cues extraction was performed on modalities with strong representation ability.
Feature filters were designed as multimodal prompt information for modalities with weak representation ability.
MPT embeds multimodal fusion information into each attention layer of the Transformer.
arXiv Detail & Related papers (2023-10-04T13:54:46Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Revisiting Disentanglement and Fusion on Modality and Context in
Conversational Multimodal Emotion Recognition [81.2011058113579]
We argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps.
We propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism ( CRM) for multimodal and context integration.
Our system achieves new state-of-the-art performance consistently.
arXiv Detail & Related papers (2023-08-08T18:11:27Z) - Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method [3.0903319879656084]
This paper introduces an innovative approach to feature alignment that revolutionizes the fusion of multimodal information.
Our method employs a novel iterative process of telescopic displacement and expansion of feature representations across different modalities, culminating in a coherent unified representation within a shared feature space.
arXiv Detail & Related papers (2023-06-29T13:49:06Z) - M2FNet: Multi-modal Fusion Network for Emotion Recognition in
Conversation [1.3864478040954673]
We propose a Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality.
It employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data.
The proposed feature extractor is trained with a novel adaptive margin-based triplet loss function to learn emotion-relevant features from the audio and visual data.
arXiv Detail & Related papers (2022-06-05T14:18:58Z) - Multimodal Image Synthesis and Editing: The Generative AI Era [131.9569600472503]
multimodal image synthesis and editing has become a hot research topic in recent years.
We comprehensively contextualize the advance of the recent multimodal image synthesis and editing.
We describe benchmark datasets and evaluation metrics as well as corresponding experimental results.
arXiv Detail & Related papers (2021-12-27T10:00:16Z) - Encoder Fusion Network with Co-Attention Embedding for Referring Image
Segmentation [87.01669173673288]
We propose an encoder fusion network (EFN), which transforms the visual encoder into a multi-modal feature learning network.
A co-attention mechanism is embedded in the EFN to realize the parallel update of multi-modal features.
The experiment results on four benchmark datasets demonstrate that the proposed approach achieves the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-05-05T02:27:25Z) - Targeted aspect based multimodal sentiment analysis:an attention capsule
extraction and multi-head fusion network [0.0]
We propose the targeted aspect-based multimodal sentiment analysis (TABMSA) for the first time.
An attention capsule extraction and multi-head fusion network (EF-Net) on the task of TABMSA is devised.
We evaluate the proposed model on two manually annotated datasets.
arXiv Detail & Related papers (2021-03-13T09:11:24Z) - MISA: Modality-Invariant and -Specific Representations for Multimodal
Sentiment Analysis [48.776247141839875]
We propose a novel framework, MISA, which projects each modality to two distinct subspaces.
The first subspace is modality-invariant, where the representations across modalities learn their commonalities and reduce the modality gap.
Our experiments on popular sentiment analysis benchmarks, MOSI and MOSEI, demonstrate significant gains over state-of-the-art models.
arXiv Detail & Related papers (2020-05-07T15:13:23Z) - Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection
and Sentiment Analysis in Conversation [2.588973722689844]
Multi-modal Emotion Detection and Sentiment Analysis can be particularly useful.
Current systems dealing with Multi-modal functionality fail to leverage and capture the context of the conversation.
We propose an end to end RNN architecture that attempts to take into account all the mentioned drawbacks.
arXiv Detail & Related papers (2020-02-19T16:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.