Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with
Wider Topic Analysis
- URL: http://arxiv.org/abs/2403.01921v1
- Date: Mon, 4 Mar 2024 10:37:48 GMT
- Title: Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with
Wider Topic Analysis
- Authors: Latifah Almurqren, Ryan Hodgson, Alexandra Cristea
- Abstract summary: The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020.
The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches.
There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
- Score: 49.1574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment analysis (SA) has been, and is still, a thriving research area.
However, the task of Arabic sentiment analysis (ASA) is still underrepresented
in the body of research. This study offers the first in-depth and in-breadth
analysis of existing ASA studies of textual content and identifies their common
themes, domains of application, methods, approaches, technologies and
algorithms used. The in-depth study manually analyses 133 ASA papers published
in the English language between 2002 and 2020 from four academic databases
(SAGE, IEEE, Springer, WILEY) and from Google Scholar. The in-breadth study
uses modern, automatic machine learning techniques, such as topic modelling and
temporal analysis, on Open Access resources, to reinforce themes and trends
identified by the prior study, on 2297 ASA publications between 2010-2020. The
main findings show the different approaches used for ASA: machine learning,
lexicon-based and hybrid approaches. Other findings include ASA 'winning'
algorithms (SVM, NB, hybrid methods). Deep learning methods, such as LSTM can
provide higher accuracy, but for ASA sometimes the corpora are not large enough
to support them. Additionally, whilst there are some ASA corpora and lexicons,
more are required. Specifically, Arabic tweets corpora and datasets are
currently only moderately sized. Moreover, Arabic lexicons that have high
coverage contain only Modern Standard Arabic (MSA) words, and those with Arabic
dialects are quite small. Thus, new corpora need to be created. On the other
hand, ASA tools are stringently lacking. There is a need to develop ASA tools
that can be used in industry, as well as in academia, for Arabic text SA.
Hence, our study offers insights into the challenges associated with ASA
research and provides suggestions for ways to move the field forward such as
lack of Dialectical Arabic resource, Arabic tweets, corpora and data sets for
SA.
Related papers
- Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world.
One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding.
Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z) - From Multiple-Choice to Extractive QA: A Case Study for English and Arabic [51.13706104333848]
We explore the feasibility of repurposing an existing multilingual dataset for a new NLP task.
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic.
We aim to help others adapt our approach for the remaining 120 BELEBELE language variants, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Exploring Sentiment Analysis Techniques in Natural Language Processing:
A Comprehensive Review [0.15229257192293202]
Sentiment analysis (SA) is the automated process of detecting and understanding the emotions conveyed through written text.
SA has gained significant popularity in the field of Natural Language Processing (NLP)
This study aims to enhance the efficiency and accuracy of SA processes, leading to smoother and error-free outcomes.
arXiv Detail & Related papers (2023-05-24T07:48:41Z) - H-AES: Towards Automated Essay Scoring for Hindi [33.755800922763946]
We reproduce and compare state-of-the-art methods for Automated Essay Scoring (AES) in the Hindi domain.
We employ classical feature-based Machine Learning (ML) and advanced end-to-end models, including LSTM Networks and Fine-Tuned Transformer Architecture.
We train and evaluate our models using translated English essays and empirically measure their performance on our own small-scale, real-world Hindi corpus.
arXiv Detail & Related papers (2023-02-28T15:14:15Z) - Survey of Aspect-based Sentiment Analysis Datasets [55.61047894397937]
Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews.
Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly.
This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems.
arXiv Detail & Related papers (2022-04-11T16:23:36Z) - Pre-trained Transformer-Based Approach for Arabic Question Answering : A
Comparative Study [0.5801044612920815]
We evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets.
We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model.
arXiv Detail & Related papers (2021-11-10T12:33:18Z) - Sentiment Analysis in Poems in Misurata Sub-dialect -- A Sentiment
Detection in an Arabic Sub-dialect [0.0]
This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Libya.
The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool 1.
arXiv Detail & Related papers (2021-09-15T10:42:39Z) - Arabic aspect based sentiment analysis using BERT [0.0]
This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT.
We are building a simple but effective BERT-based neural baseline to handle this task.
Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results.
arXiv Detail & Related papers (2021-07-28T11:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.