A Novel Context-Aware Multimodal Framework for Persian Sentiment
Analysis
- URL: http://arxiv.org/abs/2103.02636v1
- Date: Wed, 3 Mar 2021 19:09:01 GMT
- Title: A Novel Context-Aware Multimodal Framework for Persian Sentiment
Analysis
- Authors: Kia Dashtipour, Mandar Gogate, Erik Cambria, Amir Hussain
- Abstract summary: We present a first of its kind Persian multimodal dataset comprising more than 800 utterances.
We present a novel context-aware multimodal sentiment analysis framework.
We employ both decision-level (late) and feature-level (early) fusion methods to integrate affective cross-modal information.
- Score: 19.783517380422854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most recent works on sentiment analysis have exploited the text modality.
However, millions of hours of video recordings posted on social media platforms
everyday hold vital unstructured information that can be exploited to more
effectively gauge public perception. Multimodal sentiment analysis offers an
innovative solution to computationally understand and harvest sentiments from
videos by contextually exploiting audio, visual and textual cues. In this
paper, we, firstly, present a first of its kind Persian multimodal dataset
comprising more than 800 utterances, as a benchmark resource for researchers to
evaluate multimodal sentiment analysis approaches in Persian language.
Secondly, we present a novel context-aware multimodal sentiment analysis
framework, that simultaneously exploits acoustic, visual and textual cues to
more accurately determine the expressed sentiment. We employ both
decision-level (late) and feature-level (early) fusion methods to integrate
affective cross-modal information. Experimental results demonstrate that the
contextual integration of multimodal features such as textual, acoustic and
visual features deliver better performance (91.39%) compared to unimodal
features (89.24%).
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Contrastive Learning-based Multi Modal Architecture for Emoticon Prediction by Employing Image-Text Pairs [13.922091192207718]
This research aims to analyze the relationship among sentences, visuals, and emoticons.
We have proposed a novel contrastive learning based multimodal architecture.
The proposed model attained an accuracy of 91% and an MCC-score of 90% while assessing emoticons.
arXiv Detail & Related papers (2024-08-05T15:45:59Z) - Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey [66.166184609616]
ChatGPT has opened up immense potential for applying large language models (LLMs) to text-centric multimodal tasks.
It is still unclear how existing LLMs can adapt better to text-centric multimodal sentiment analysis tasks.
arXiv Detail & Related papers (2024-06-12T10:36:27Z) - WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual
World Knowledge [73.76722241704488]
We propose a plug-in framework named WisdoM to leverage the contextual world knowledge induced from the large vision-language models (LVLMs) for enhanced multimodal sentiment analysis.
We show that our approach has substantial improvements over several state-of-the-art methods.
arXiv Detail & Related papers (2024-01-12T16:08:07Z) - Holistic Visual-Textual Sentiment Analysis with Prior Models [64.48229009396186]
We propose a holistic method that achieves robust visual-textual sentiment analysis.
The proposed method consists of four parts: (1) a visual-textual branch to learn features directly from data for sentiment analysis, (2) a visual expert branch with a set of pre-trained "expert" encoders to extract selected semantic visual features, (3) a CLIP branch to implicitly model visual-textual correspondence, and (4) a multimodal feature fusion network based on BERT to fuse multimodal features and make sentiment predictions.
arXiv Detail & Related papers (2022-11-23T14:40:51Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Video Sentiment Analysis with Bimodal Information-augmented Multi-Head
Attention [7.997124140597719]
This study focuses on the sentiment analysis of videos containing time series data of multiple modalities.
The key problem is how to fuse these heterogeneous data.
Based on bimodal interaction, more important bimodal features are assigned larger weights.
arXiv Detail & Related papers (2021-03-03T12:30:11Z) - An AutoML-based Approach to Multimodal Image Sentiment Analysis [1.0499611180329804]
We propose a method that combines both textual and image individual sentiment analysis into a final fused classification based on AutoML.
Our method achieved state-of-the-art performance in the B-T4SA dataset, with 95.19% accuracy.
arXiv Detail & Related papers (2021-02-16T11:28:50Z) - Multimodal Categorization of Crisis Events in Social Media [81.07061295887172]
We present a new multimodal fusion method that leverages both images and texts as input.
In particular, we introduce a cross-attention module that can filter uninformative and misleading components from weak modalities.
We show that our method outperforms the unimodal approaches and strong multimodal baselines by a large margin on three crisis-related tasks.
arXiv Detail & Related papers (2020-04-10T06:31:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.