Related papers: Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis

Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis

URL: http://arxiv.org/abs/2403.01921v1
Date: Mon, 4 Mar 2024 10:37:48 GMT
Title: Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis
Authors: Latifah Almurqren, Ryan Hodgson, Alexandra Cristea
Abstract summary: The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020. The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches. There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA.
Score: 49.1574468325115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sentiment analysis (SA) has been, and is still, a thriving research area. However, the task of Arabic sentiment analysis (ASA) is still underrepresented in the body of research. This study offers the first in-depth and in-breadth analysis of existing ASA studies of textual content and identifies their common themes, domains of application, methods, approaches, technologies and algorithms used. The in-depth study manually analyses 133 ASA papers published in the English language between 2002 and 2020 from four academic databases (SAGE, IEEE, Springer, WILEY) and from Google Scholar. The in-breadth study uses modern, automatic machine learning techniques, such as topic modelling and temporal analysis, on Open Access resources, to reinforce themes and trends identified by the prior study, on 2297 ASA publications between 2010-2020. The main findings show the different approaches used for ASA: machine learning, lexicon-based and hybrid approaches. Other findings include ASA 'winning' algorithms (SVM, NB, hybrid methods). Deep learning methods, such as LSTM can provide higher accuracy, but for ASA sometimes the corpora are not large enough to support them. Additionally, whilst there are some ASA corpora and lexicons, more are required. Specifically, Arabic tweets corpora and datasets are currently only moderately sized. Moreover, Arabic lexicons that have high coverage contain only Modern Standard Arabic (MSA) words, and those with Arabic dialects are quite small. Thus, new corpora need to be created. On the other hand, ASA tools are stringently lacking. There is a need to develop ASA tools that can be used in industry, as well as in academia, for Arabic text SA. Hence, our study offers insights into the challenges associated with ASA research and provides suggestions for ways to move the field forward such as lack of Dialectical Arabic resource, Arabic tweets, corpora and data sets for SA.

Related papers

ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics [0.6323908398583084]
We introduce ADAB (Arabic Politeness dataset), a new annotated Arabic dataset collected from four online platforms.<n>The dataset was annotated based on Arabic linguistic traditions and pragmatic theory, resulting in three classes: polite, impolite, and neutral.<n>It contains 10,000 samples with linguistic feature annotations across 16 politeness categories and achieves substantial inter-annotator agreement.
arXiv Detail & Related papers (2026-02-14T19:58:53Z)
The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text [0.05399757380241794]
Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text.<n>This paper presents a comprehensive investigation of Arabic machine-generated text.<n>We develop BERT-based detection models that achieve exceptional performance in formal contexts.
arXiv Detail & Related papers (2025-05-29T09:24:00Z)
Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion [55.27025066199226]
This paper addresses the need for democratizing large language models (LLM) in the Arab world. One practical objective for an Arabic LLM is to utilize an Arabic-specific vocabulary for the tokenizer that could speed up decoding. Inspired by the vocabulary learning during Second Language (Arabic) Acquisition for humans, the released AraLLaMA employs progressive vocabulary expansion.
arXiv Detail & Related papers (2024-12-16T19:29:06Z)
ROAST: Review-level Opinion Aspect Sentiment Target Joint Detection for ABSA [50.90538760832107]
This research presents a novel task, Review-Level Opinion Aspect Sentiment Target (ROAST) ROAST seeks to close the gap between sentence-level and text-level ABSA by identifying every ABSA constituent at the review level. We extend the available datasets to enable ROAST, addressing the drawbacks noted in previous research.
arXiv Detail & Related papers (2024-05-30T17:29:15Z)
From Multiple-Choice to Extractive QA: A Case Study for English and Arabic [51.13706104333848]
We explore the feasibility of repurposing an existing multilingual dataset for a new NLP task. We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic. We aim to help others adapt our approach for the remaining 120 BELEBELE language variants, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z)
Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language. The proposed framework can explain specific predictions by training a local surrogate explainable model. We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
Exploring Sentiment Analysis Techniques in Natural Language Processing: A Comprehensive Review [0.15229257192293202]
Sentiment analysis (SA) is the automated process of detecting and understanding the emotions conveyed through written text. SA has gained significant popularity in the field of Natural Language Processing (NLP) This study aims to enhance the efficiency and accuracy of SA processes, leading to smoother and error-free outcomes.
arXiv Detail & Related papers (2023-05-24T07:48:41Z)
H-AES: Towards Automated Essay Scoring for Hindi [33.755800922763946]
We reproduce and compare state-of-the-art methods for Automated Essay Scoring (AES) in the Hindi domain. We employ classical feature-based Machine Learning (ML) and advanced end-to-end models, including LSTM Networks and Fine-Tuned Transformer Architecture. We train and evaluate our models using translated English essays and empirically measure their performance on our own small-scale, real-world Hindi corpus.
arXiv Detail & Related papers (2023-02-28T15:14:15Z)
Survey of Aspect-based Sentiment Analysis Datasets [55.61047894397937]
Aspect-based sentiment analysis (ABSA) is a natural language processing problem that requires analyzing user-generated reviews. Numerous yet scattered corpora for ABSA make it difficult for researchers to identify corpora best suited for a specific ABSA subtask quickly. This study aims to present a database of corpora that can be used to train and assess autonomous ABSA systems.
arXiv Detail & Related papers (2022-04-11T16:23:36Z)
MACRONYM: A Large-Scale Dataset for Multilingual and Multi-Domain Acronym Extraction [66.60031336330547]
Acronyms and their expanded forms are necessary for various NLP applications. One limitation of existing AE research is that they are limited to the English language and certain domains. Lacking annotated datasets in multiple languages and domains has been a major issue to hinder research in this area.
arXiv Detail & Related papers (2022-02-19T23:08:38Z)
Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study [0.5801044612920815]
We evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model.
arXiv Detail & Related papers (2021-11-10T12:33:18Z)
Sentiment Analysis in Poems in Misurata Sub-dialect -- A Sentiment Detection in an Arabic Sub-dialect [0.0]
This study focuses on detecting sentiment in poems written in Misurata Arabic sub-dialect spoken in Libya. The tools used to detect sentiment from the dataset are Sklearn as well as Mazajak sentiment tool 1.
arXiv Detail & Related papers (2021-09-15T10:42:39Z)
Arabic aspect based sentiment analysis using BERT [0.0]
This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT. We are building a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results.
arXiv Detail & Related papers (2021-07-28T11:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.