Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content
- URL: http://arxiv.org/abs/2504.13545v1
- Date: Fri, 18 Apr 2025 08:21:12 GMT
- Title: Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content
- Authors: Azmarah Rizvi, Navojith Thamindu, A. M. N. H. Adhikari, W. P. U. Senevirathna, Dharshana Kasthurirathna, Lakmini Abeywardhana,
- Abstract summary: Existing models struggle with low-resource languages like Sinhala and lack interpretability for practical use.<n>This research develops a hybrid aspect-based sentiment analysis framework that enhances multilingual capabilities with explainable outputs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sentiment analysis is crucial for brand reputation management in the banking sector, where customer feedback spans English, Sinhala, Singlish, and code-mixed text. Existing models struggle with low-resource languages like Sinhala and lack interpretability for practical use. This research develops a hybrid aspect-based sentiment analysis framework that enhances multilingual capabilities with explainable outputs. Using cleaned banking customer reviews, we fine-tune XLM-RoBERTa for Sinhala and code-mixed text, integrate domain-specific lexicon correction, and employ BERT-base-uncased for English. The system classifies sentiment (positive, neutral, negative) with confidence scores, while SHAP and LIME improve interpretability by providing real-time sentiment explanations. Experimental results show that our approaches outperform traditional transformer-based classifiers, achieving 92.3 percent accuracy and an F1-score of 0.89 in English and 88.4 percent in Sinhala and code-mixed content. An explainability analysis reveals key sentiment drivers, improving trust and transparency. A user-friendly interface delivers aspect-wise sentiment insights, ensuring accessibility for businesses. This research contributes to robust, transparent sentiment analysis for financial applications by bridging gaps in multilingual, low-resource NLP and explainability.
Related papers
- Emotion Classification In-Context in Spanish [0.0]
We classify customer feedback in Spanish into three emotion categories--positive, neutral, and negative--using advanced NLP and ML techniques.<n>Traditional methods translate feedback from widely spoken languages to less common ones, resulting in a loss of semantic integrity.<n>We propose a hybrid approach that combines TF-IDF with BERT embeddings, effectively transforming Spanish text into rich numerical representations.
arXiv Detail & Related papers (2025-05-26T23:09:41Z) - Full-Parameter Continual Pretraining of Gemma2: Insights into Fluency and Domain Knowledge [0.0]
We empirically investigate the relationship between linguistic fluency and domain knowledge in the context of continual learning with large language models (LLMs)<n>Specifically, we enhance the linguistic fluency of the Gemma2 LLM for the Lithuanian language by autoregressively pretraining its full parameter set on the first 10% of the Lithuanian language component of the CulturaX dataset.<n>To prevent catastrophic forgetting of the model's existing domain knowledge, we apply Elastic Weight Consolidation (EWC)<n>In the post-training evaluations, we assess linguistic fluency through perplexity and evaluate domain knowledge using accuracy on a suite of language understanding benchmarks.
arXiv Detail & Related papers (2025-05-09T10:43:37Z) - Keyword Extraction, and Aspect Classification in Sinhala, English, and Code-Mixed Content [0.0]
This study introduces a hybrid NLP method to improve keyword extraction, content filtering, and aspect-based classification of banking content.<n>The present framework offers an accurate and scalable solution for brand reputation monitoring in code-mixed and low-resource banking environments.
arXiv Detail & Related papers (2025-04-14T20:01:34Z) - Do LLMs Understand Your Translations? Evaluating Paragraph-level MT with Question Answering [68.3400058037817]
We introduce TREQA (Translation Evaluation via Question-Answering), a framework that extrinsically evaluates translation quality.<n>We show that TREQA is competitive with and, in some cases, outperforms state-of-the-art neural and LLM-based metrics in ranking alternative paragraph-level translations.
arXiv Detail & Related papers (2025-04-10T09:24:54Z) - Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics [0.20971479389679337]
Large language models (LLMs) are increasingly utilized for machine translation, yet their predictions often exhibit uncertainties that hinder interpretability and user trust.<n>This paper addresses two primary objectives: (1) providing users with token-level insights into model confidence and (2) developing a web-based visualization tool to quantify and represent translation uncertainties.
arXiv Detail & Related papers (2025-01-26T17:14:51Z) - A Multilingual Sentiment Lexicon for Low-Resource Language Translation using Large Languages Models and Explainable AI [0.0]
South Africa and the DRC present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba.
This study develops a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu.
A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models trained to predict sentiment.
arXiv Detail & Related papers (2024-11-06T23:41:18Z) - LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)<n>The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.<n>The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z) - A hybrid transformer and attention based recurrent neural network for robust and interpretable sentiment analysis of tweets [0.3495246564946556]
Existing models face challenges with linguistic diversity, generalizability, and explainability.
We propose TRABSA, a hybrid framework integrating transformer-based architectures, attention mechanisms, and BiLSTM networks.
We bridge gaps in sentiment analysis benchmarks, ensuring state-of-the-art accuracy.
arXiv Detail & Related papers (2024-03-30T09:20:43Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - Syntactic Knowledge via Graph Attention with BERT in Machine Translation [0.0]
We propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios.
Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement.
Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores.
arXiv Detail & Related papers (2023-05-22T18:56:14Z) - Consistency Analysis of ChatGPT [65.268245109828]
This paper investigates the trustworthiness of ChatGPT and GPT-4 regarding logically consistent behaviour.
Our findings suggest that while both models appear to show an enhanced language understanding and reasoning ability, they still frequently fall short of generating logically consistent predictions.
arXiv Detail & Related papers (2023-03-11T01:19:01Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - Self-Attention with Cross-Lingual Position Representation [112.05807284056337]
Position encoding (PE) is used to preserve the word order information for natural language processing tasks, generating fixed position indices for input sequences.
Due to word order divergences in different languages, modeling the cross-lingual positional relationships might help SANs tackle this problem.
We augment SANs with emphcross-lingual position representations to model the bilingually aware latent structure for the input sentence.
arXiv Detail & Related papers (2020-04-28T05:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.