Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis
- URL: http://arxiv.org/abs/2107.13669v1
- Date: Wed, 28 Jul 2021 23:33:42 GMT
- Title: Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis
- Authors: Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe
Morency, and Soujanya Poria
- Abstract summary: Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
- Score: 96.46952672172021
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Multimodal sentiment analysis aims to extract and integrate semantic
information collected from multiple modalities to recognize the expressed
emotions and sentiment in multimodal data. This research area's major concern
lies in developing an extraordinary fusion scheme that can extract and
integrate key information from various modalities. However, one issue that may
restrict previous work to achieve a higher level is the lack of proper modeling
for the dynamics of the competition between the independence and relevance
among modalities, which could deteriorate fusion outcomes by causing the
collapse of modality-specific feature space or introducing extra noise. To
mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel
end-to-end network that performs fusion (relevance increment) and separation
(difference increment) on pairwise modality representations. The two parts are
trained simultaneously such that the combat between them is simulated. The
model takes two bimodal pairs as input due to the known information imbalance
among modalities. In addition, we leverage a gated control mechanism in the
Transformer architecture to further improve the final output. Experimental
results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our
model significantly outperforms the SOTA. The implementation of this work is
available at https://github.com/declare-lab/BBFN.
Related papers
- GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis [0.0]
Multimodal Sentiment Analysis (MSA) leverages multiple data modals to analyze human sentiment.
Existing MSA models generally employ cutting-edge multimodal fusion and representation learning-based methods to promote MSA capability.
Our proposed GSIFN incorporates two main components to solve these problems: (i) a graph-structured and interlaced-masked multimodal Transformer.
It adopts the Interlaced Mask mechanism to construct robust multimodal graph embedding, achieve all-modal-in-one Transformer-based fusion, and greatly reduce the computational overhead.
arXiv Detail & Related papers (2024-08-27T06:44:28Z) - Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model [18.19558762805031]
This paper proposes the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes.
Experiments on CMU-EI, CH-SIMS, CH-SIMSV2 through multi-domain input verify the effectiveness of our model.
Results demonstrate that Coupled Mamba model is capable of enhanced multi-modal fusion.
arXiv Detail & Related papers (2024-05-28T09:57:03Z) - U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation [63.31007867379312]
We introduce U3M: An Unbiased Multiscale Modal Fusion Model for Multimodal Semantics.
We employ feature fusion at multiple scales to ensure the effective extraction and integration of both global and local features.
Experimental results demonstrate that our approach achieves superior performance across multiple datasets.
arXiv Detail & Related papers (2024-05-24T08:58:48Z) - FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and
Multi-Clients [32.59184269562571]
We propose a multi-modal collaborative diffusion federated learning framework called FedDiff.
Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder.
Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure.
arXiv Detail & Related papers (2023-11-16T02:29:37Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Efficient Multimodal Transformer with Dual-Level Feature Restoration for
Robust Multimodal Sentiment Analysis [47.29528724322795]
Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently.
Despite significant progress, there are still two major challenges on the way towards robust MSA.
We propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR)
arXiv Detail & Related papers (2022-08-16T08:02:30Z) - Improving Multimodal fusion via Mutual Dependency Maximisation [5.73995120847626]
Multimodal sentiment analysis is a trending area of research, and the multimodal fusion is one of its most active topic.
In this work, we investigate unexplored penalties and propose a set of new objectives that measure the dependency between modalities.
We demonstrate that our new penalties lead to a consistent improvement (up to $4.3$ on accuracy) across a large variety of state-of-the-art models.
arXiv Detail & Related papers (2021-08-31T06:26:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.