Related papers: Arabic Sentiment Analysis with Noisy Deep Explainable Model

Arabic Sentiment Analysis with Noisy Deep Explainable Model

URL: http://arxiv.org/abs/2309.13731v2
Date: Wed, 29 Nov 2023 11:52:58 GMT
Title: Arabic Sentiment Analysis with Noisy Deep Explainable Model
Authors: Md. Atabuzzaman, Md Shajalal, Maksuda Bilkis Baby, Alexander Boden
Abstract summary: This paper proposes an explainable sentiment classification framework for the Arabic language. The proposed framework can explain specific predictions by training a local surrogate explainable model. We carried out experiments on public benchmark Arabic SA datasets.
Score: 48.22321420680046
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Sentiment Analysis (SA) is an indispensable task for many real-world applications. Compared to limited resourced languages (i.e., Arabic, Bengali), most of the research on SA are conducted for high resourced languages (i.e., English, Chinese). Moreover, the reasons behind any prediction of the Arabic sentiment analysis methods exploiting advanced artificial intelligence (AI)-based approaches are like black-box - quite difficult to understand. This paper proposes an explainable sentiment classification framework for the Arabic language by introducing a noise layer on Bi-Directional Long Short-Term Memory (BiLSTM) and Convolutional Neural Networks (CNN)-BiLSTM models that overcome over-fitting problem. The proposed framework can explain specific predictions by training a local surrogate explainable model to understand why a particular sentiment (positive or negative) is being predicted. We carried out experiments on public benchmark Arabic SA datasets. The results concluded that adding noise layers improves the performance in sentiment analysis for the Arabic language by reducing overfitting and our method outperformed some known state-of-the-art methods. In addition, the introduced explainability with noise layer could make the model more transparent and accountable and hence help adopting AI-enabled system in practice.

Related papers

AraTable: Benchmarking LLMs' Reasoning and Understanding of Arabic Tabular Data [2.9631016562930546]
We present AraTable, a benchmark designed to evaluate the reasoning and understanding capabilities of large language models when applied to Arabic data.<n>AraTable consists of various evaluation tasks, such as direct question answering, fact verification, and complex reasoning.<n>We propose a fully automated evaluation framework that uses a self-deliberation mechanism and achieves performance nearly identical to that of human judges.
arXiv Detail & Related papers (2025-07-24T14:26:41Z)
Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis [0.5120567378386615]
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs) This research introduces a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attention layer. The method is evaluated using the Stanford Sentiment Treebank (SST-2) dataset, which shows how different sentences affect different layers.
arXiv Detail & Related papers (2025-03-15T01:37:54Z)
Investigating the Impact of Language-Adaptive Fine-Tuning on Sentiment Analysis in Hausa Language Using AfriBERTa [2.5055584842618175]
Sentiment analysis (SA) plays a vital role in Natural Language Processing (NLP) by identifying sentiments expressed in text. This study investigates the effectiveness of Language-Adaptive Fine-Tuning (LAFT) to improve SA performance in Hausa.
arXiv Detail & Related papers (2025-01-19T11:52:46Z)
Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing [0.0]
This paper presents a novel approach to fine-tuning the Qwen2-1.5B model for Arabic language processing using Quantized Low-Rank Adaptation (QLoRA) on a system with only 4GB VRAM. We detail the process of adapting this large language model to the Arabic domain, using diverse datasets including Bactrian, OpenAssistant, and Wikipedia Arabic corpora. Experimental results over 10,000 training steps show significant performance improvements, with the final loss converging to 0.1083.
arXiv Detail & Related papers (2024-12-23T13:08:48Z)
A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI [0.0]
South Africa and the DRC present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba. This study develops a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu. A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models trained to predict sentiment.
arXiv Detail & Related papers (2024-11-06T23:41:18Z)
Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence. We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena. As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z)
Exploring Tokenization Strategies and Vocabulary Sizes for Enhanced Arabic Language Models [0.0]
This paper examines the impact of tokenization strategies and vocabulary sizes on the performance of Arabic language models. Our study uncovers limited impacts of vocabulary size on model performance while keeping the model size unchanged. Paper's recommendations include refining tokenization strategies to address dialect challenges, enhancing model robustness across diverse linguistic contexts, and expanding datasets to encompass the rich dialect based Arabic.
arXiv Detail & Related papers (2024-03-17T07:44:44Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Content-Localization based System for Analyzing Sentiment and Hate Behaviors in Low-Resource Dialectal Arabic: English to Levantine and Gulf [5.2957928879391]
This paper proposes to localize content of resources in high-resourced languages into under-resourced Arabic dialects. We utilize content-localization based neural machine translation to develop sentiment and hate classifiers for two low-resourced Arabic dialects: Levantine and Gulf. Our findings shed light on the importance of considering the unique nature of dialects within the same language and ignoring the dialectal aspect would lead to misleading analysis.
arXiv Detail & Related papers (2023-11-27T15:37:33Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Pre-trained Transformer-Based Approach for Arabic Question Answering : A Comparative Study [0.5801044612920815]
We evaluate the state-of-the-art pre-trained transformers models for Arabic QA using four reading comprehension datasets. We fine-tuned and compared the performance of the AraBERTv2-base model, AraBERTv0.2-large model, and AraELECTRA model.
arXiv Detail & Related papers (2021-11-10T12:33:18Z)
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models. Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models. We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z)
Arabic aspect based sentiment analysis using BERT [0.0]
This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT. We are building a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results.
arXiv Detail & Related papers (2021-07-28T11:34:00Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.