Ensemble Language Models for Multilingual Sentiment Analysis
- URL: http://arxiv.org/abs/2403.06060v1
- Date: Sun, 10 Mar 2024 01:39:10 GMT
- Title: Ensemble Language Models for Multilingual Sentiment Analysis
- Authors: Md Arid Hasan
- Abstract summary: We explore sentiment analysis on tweet texts from SemEval-17 and the Arabic Sentiment Tweet dataset.
Our findings include monolingual models exhibiting superior performance and ensemble models outperforming the baseline.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid advancement of social media enables us to analyze user opinions. In
recent times, sentiment analysis has shown a prominent research gap in
understanding human sentiment based on the content shared on social media.
Although sentiment analysis for commonly spoken languages has advanced
significantly, low-resource languages like Arabic continue to get little
research due to resource limitations. In this study, we explore sentiment
analysis on tweet texts from SemEval-17 and the Arabic Sentiment Tweet dataset.
Moreover, We investigated four pretrained language models and proposed two
ensemble language models. Our findings include monolingual models exhibiting
superior performance and ensemble models outperforming the baseline while the
majority voting ensemble outperforms the English language.
Related papers
- Sentiment Analysis Across Languages: Evaluation Before and After Machine Translation to English [0.0]
This paper examines the performance of transformer models in Sentiment Analysis tasks across multilingual datasets and text that has undergone machine translation.
By comparing the effectiveness of these models in different linguistic contexts, we gain insights into their performance variations and potential implications for sentiment analysis across diverse languages.
arXiv Detail & Related papers (2024-05-05T10:52:09Z) - M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets [4.478789600295492]
This paper transforms an existing textual Twitter sentiment dataset into a multimodal format through a straightforward curation process.
Our work opens up new avenues for sentiment-related research within the research community.
arXiv Detail & Related papers (2024-04-02T09:11:58Z) - Zero-shot Sentiment Analysis in Low-Resource Languages Using a
Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets.
We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z) - Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis [6.471458199049549]
In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments.
We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz.
Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios.
arXiv Detail & Related papers (2023-08-21T15:19:10Z) - DN at SemEval-2023 Task 12: Low-Resource Language Text Classification
via Multilingual Pretrained Language Model Fine-tuning [0.0]
Most existing models and datasets for sentiment analysis are developed for high-resource languages, such as English and Chinese.
The AfriSenti-SemEval 2023 Shared Task 12 aims to fill this gap by evaluating sentiment analysis models on low-resource African languages.
We present our solution to the shared task, where we employed different multilingual XLM-R models with classification head trained on various data.
arXiv Detail & Related papers (2023-05-04T07:28:45Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - XPersona: Evaluating Multilingual Personalized Chatbot [76.00426517401894]
We propose a multi-lingual extension of Persona-Chat, namely XPersona.
Our dataset includes persona conversations in six different languages other than English for building and evaluating multilingual personalized agents.
arXiv Detail & Related papers (2020-03-17T07:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.