Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment
Analysis
- URL: http://arxiv.org/abs/2010.08218v1
- Date: Fri, 16 Oct 2020 08:02:11 GMT
- Title: Deep-HOSeq: Deep Higher Order Sequence Fusion for Multimodal Sentiment
Analysis
- Authors: Sunny Verma, Jiwei Wang, Zhefeng Ge, Rujia Shen, Fan Jin, Yang Wang,
Fang Chen, and Wei Liu
- Abstract summary: Multimodal sentiment analysis utilizes multiple heterogeneous modalities for sentiment classification.
Recent multimodal fusion schemes customize LSTMs to discover intra-modal dynamics.
We propose a common network to discover both intra-modal and inter-modal dynamics.
- Score: 12.386788662621338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal sentiment analysis utilizes multiple heterogeneous modalities for
sentiment classification. The recent multimodal fusion schemes customize LSTMs
to discover intra-modal dynamics and design sophisticated attention mechanisms
to discover the inter-modal dynamics from multimodal sequences. Although
powerful, these schemes completely rely on attention mechanisms which is
problematic due to two major drawbacks 1) deceptive attention masks, and 2)
training dynamics. Nevertheless, strenuous efforts are required to optimize
hyperparameters of these consolidate architectures, in particular their
custom-designed LSTMs constrained by attention schemes. In this research, we
first propose a common network to discover both intra-modal and inter-modal
dynamics by utilizing basic LSTMs and tensor based convolution networks. We
then propose unique networks to encapsulate temporal-granularity among the
modalities which is essential while extracting information within asynchronous
sequences. We then integrate these two kinds of information via a fusion layer
and call our novel multimodal fusion scheme as Deep-HOSeq (Deep network with
higher order Common and Unique Sequence information). The proposed Deep-HOSeq
efficiently discovers all-important information from multimodal sequences and
the effectiveness of utilizing both types of information is empirically
demonstrated on CMU-MOSEI and CMU-MOSI benchmark datasets. The source code of
our proposed Deep-HOSeq is and available at
https://github.com/sverma88/Deep-HOSeq--ICDM-2020.
Related papers
- Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding [51.96911650437978]
Multi-modal fusion has played a vital role in multi-modal scene understanding.
Most existing methods focus on cross-modal fusion involving two modalities, often overlooking more complex multi-modal fusion.
We propose a relational Part-Whole Fusion (PWRF) framework for multi-modal scene understanding.
arXiv Detail & Related papers (2024-10-19T02:27:30Z) - Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on
Resource-constrained Devices [0.4915744683251151]
We propose a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices.
Harmonic-NAS achieves 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.
arXiv Detail & Related papers (2023-09-12T21:37:26Z) - Asymmetric double-winged multi-view clustering network for exploring
Diverse and Consistent Information [28.300395619444796]
In unsupervised scenarios, deep contrastive multi-view clustering (DCMVC) is becoming a hot research spot.
We propose a novel multi-view clustering network termed CodingNet to explore the diverse and consistent information simultaneously.
Our framework's efficacy is validated through extensive experiments on six widely used benchmark datasets.
arXiv Detail & Related papers (2023-09-01T14:13:22Z) - Deep Equilibrium Multimodal Fusion [88.04713412107947]
Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently.
We propose a novel deep equilibrium (DEQ) method towards multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process.
Experiments on BRCA, MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQ fusion.
arXiv Detail & Related papers (2023-06-29T03:02:20Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Attention Bottlenecks for Multimodal Fusion [90.75885715478054]
Machine perception models are typically modality-specific and optimised for unimodal benchmarks.
We introduce a novel transformer based architecture that uses fusion' for modality fusion at multiple layers.
We conduct thorough ablation studies, and achieve state-of-the-art results on multiple audio-visual classification benchmarks.
arXiv Detail & Related papers (2021-06-30T22:44:12Z) - Multi-Level Attentive Convoluntional Neural Network for Crowd Counting [12.61997540961144]
We propose a multi-level attentive Convolutional Neural Network (MLAttnCNN) for crowd counting.
We extract high-level contextual information with multiple different scales applied in pooling.
We use multi-level attention modules to enrich the characteristics at different layers to achieve more efficient multi-scale feature fusion.
arXiv Detail & Related papers (2021-05-24T17:29:00Z) - A novel multimodal fusion network based on a joint coding model for lane
line segmentation [22.89466867866239]
We introduce a novel multimodal fusion architecture from an information theory perspective.
We demonstrate its practical utility using LiDAR camera fusion networks.
Our optimal fusion network achieves 85%+ lane line accuracy and 98.7%+ overall.
arXiv Detail & Related papers (2021-03-20T06:47:58Z) - M2Net: Multi-modal Multi-channel Network for Overall Survival Time
Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients.
Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume.
We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.