Related papers: Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture

Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture

URL: http://arxiv.org/abs/2303.14708v1
Date: Sun, 26 Mar 2023 12:34:01 GMT
Title: Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture
Authors: Huiru Wang, Xiuhong Li, Zenyu Ren, Dan Yang, chunming Ma
Abstract summary: In our model, we use BERT + BiLSTM as new feature extractor to capture the long-distance dependencies in sentences. To remove redundant information, CNN and CBAM attention are added after splicing text features and picture features. The experimental results show that our model achieves a sound effect, similar to the advanced model.
Score: 3.9850392954445875
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Because multimodal data contains more modal information, multimodal sentiment analysis has become a recent research hotspot. However, redundant information is easily involved in feature fusion after feature extraction, which has a certain impact on the feature representation after fusion. Therefore, in this papaer, we propose a new multimodal sentiment analysis model. In our model, we use BERT + BiLSTM as new feature extractor to capture the long-distance dependencies in sentences and consider the position information of input sequences to obtain richer text features. To remove redundant information and make the network pay more attention to the correlation between image and text features, CNN and CBAM attention are added after splicing text features and picture features, to improve the feature representation ability. On the MVSA-single dataset and HFM dataset, compared with the baseline model, the ACC of our model is improved by 1.78% and 1.91%, and the F1 value is enhanced by 3.09% and 2.0%, respectively. The experimental results show that our model achieves a sound effect, similar to the advanced model.

Related papers

TWSSenti: A Novel Hybrid Framework for Topic-Wise Sentiment Analysis on Social Media Using Transformer Models [0.0]
This study explores a hybrid framework combining transformer-based models to improve sentiment classification accuracy and robustness. The framework addresses challenges such as noisy data, contextual ambiguity, and generalization across diverse datasets. This research highlights its applicability to real-world tasks such as social media monitoring, customer sentiment analysis, and public opinion tracking.
arXiv Detail & Related papers (2025-04-14T05:44:11Z)
Multimodality Helps Few-Shot 3D Point Cloud Semantic Segmentation [61.91492500828508]
Few-shot 3D point cloud segmentation (FS-PCS) aims at generalizing models to segment novel categories with minimal support samples. We introduce a cost-free multimodal FS-PCS setup, utilizing textual labels and the potentially available 2D image modality. We propose a simple yet effective Test-time Adaptive Cross-modal Seg (TACC) technique to mitigate training bias.
arXiv Detail & Related papers (2024-10-29T19:28:41Z)
Multiple Areal Feature Aware Transportation Demand Prediction [2.996323123990199]
We propose a novel multi-feature-aware graph convolutional recurrent network (ST-MFGCRN) that fuses multiple areal features during-temproal understanding. We evaluate the proposed model on two real-world transportation datasets.
arXiv Detail & Related papers (2024-08-23T07:51:10Z)
MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection [0.41942958779358674]
We propose a new dynamic fusion framework dubbed MDF for fake news detection. Our model consists of two main components: (1) UEM as an uncertainty modeling module employing a multi-head attention mechanism to model intra-modal uncertainty; and (2) DFN is a dynamic fusion module based on D-S evidence theory for dynamically fusing the weights of two modalities, text and image.
arXiv Detail & Related papers (2024-06-28T09:24:52Z)
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++. It models two common event representations simultaneously, i.e., event images and event voxels. We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z)
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction [10.684005956288347]
We present the Intra- and Inter-Sample Relationship Modeling (I2SRM) method for this task. Our proposed method achieves competitive results, 77.12% F1-score on Twitter-2015, 88.40% F1-score on Twitter-2017, and 84.12% F1-score on MNRE.
arXiv Detail & Related papers (2023-10-10T05:50:25Z)
Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models. Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z)
Named Entity and Relation Extraction with Multi-Modal Retrieval [51.660650522630526]
Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. We propose a novel Multi-modal Retrieval based framework (MoRe) MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
arXiv Detail & Related papers (2022-12-03T13:11:32Z)
Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations. Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
Better Feature Integration for Named Entity Recognition [30.676768644145]
We propose a simple and robust solution to incorporate both types of features with our Synergized-LSTM (Syn-LSTM) The results demonstrate that the proposed model achieves better performance than previous approaches while requiring fewer parameters.
arXiv Detail & Related papers (2021-04-12T09:55:06Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.