Exploring Multimodal Sentiment Analysis via CBAM Attention and
Double-layer BiLSTM Architecture
- URL: http://arxiv.org/abs/2303.14708v1
- Date: Sun, 26 Mar 2023 12:34:01 GMT
- Title: Exploring Multimodal Sentiment Analysis via CBAM Attention and
Double-layer BiLSTM Architecture
- Authors: Huiru Wang, Xiuhong Li, Zenyu Ren, Dan Yang, chunming Ma
- Abstract summary: In our model, we use BERT + BiLSTM as new feature extractor to capture the long-distance dependencies in sentences.
To remove redundant information, CNN and CBAM attention are added after splicing text features and picture features.
The experimental results show that our model achieves a sound effect, similar to the advanced model.
- Score: 3.9850392954445875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Because multimodal data contains more modal information, multimodal sentiment
analysis has become a recent research hotspot. However, redundant information
is easily involved in feature fusion after feature extraction, which has a
certain impact on the feature representation after fusion. Therefore, in this
papaer, we propose a new multimodal sentiment analysis model. In our model, we
use BERT + BiLSTM as new feature extractor to capture the long-distance
dependencies in sentences and consider the position information of input
sequences to obtain richer text features. To remove redundant information and
make the network pay more attention to the correlation between image and text
features, CNN and CBAM attention are added after splicing text features and
picture features, to improve the feature representation ability. On the
MVSA-single dataset and HFM dataset, compared with the baseline model, the ACC
of our model is improved by 1.78% and 1.91%, and the F1 value is enhanced by
3.09% and 2.0%, respectively. The experimental results show that our model
achieves a sound effect, similar to the advanced model.
Related papers
- Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal
Information Extraction [10.684005956288347]
We present the Intra- and Inter-Sample Relationship Modeling (I2SRM) method for this task.
Our proposed method achieves competitive results, 77.12% F1-score on Twitter-2015, 88.40% F1-score on Twitter-2017, and 84.12% F1-score on MNRE.
arXiv Detail & Related papers (2023-10-10T05:50:25Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE.
We propose a novel Multi-modal Retrieval based framework (MoRe)
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z) - Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity
Recognition [6.0306313759213275]
We propose a multi-modal framework that learns to effectively combine features from RGB Video and IMU sensors.
Our model is trained in two-stage, where in the first stage, each input encoder learns to effectively extract features.
We show significant improvements of 22% and 11% compared to video only, and 20% and 12% on MMAct datasets.
arXiv Detail & Related papers (2022-11-08T15:48:44Z) - CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for
Multimodal Sentiment Detection [24.243349217940274]
We propose a Contrastive Learning and Multi-Layer Fusion (CLMLF) method for multimodal sentiment detection.
Specifically, we first encode text and image to obtain hidden representations, and then use a multi-layer fusion module to align and fuse the token-level features of text and image.
In addition to the sentiment analysis task, we also designed two contrastive learning tasks, label based contrastive learning and data based contrastive learning tasks.
arXiv Detail & Related papers (2022-04-12T04:03:06Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Better Feature Integration for Named Entity Recognition [30.676768644145]
We propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM)
The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters.
arXiv Detail & Related papers (2021-04-12T09:55:06Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.