Related papers: BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and it's baseline evaluation

BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and it's baseline evaluation

URL: http://arxiv.org/abs/2012.00288v1
Date: Tue, 1 Dec 2020 06:09:44 GMT
Title: BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and it's baseline evaluation
Authors: Mahfuz Ahmed Masum, Sheikh Junayed Ahmed, Ayesha Tasnim, Md Saiful Islam
Abstract summary: We present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals.
Score: 0.8793721044482612
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the breathtaking growth of social media or newspaper user comments, online product reviews comments, sentiment analysis (SA) has captured substantial interest from the researchers. With the fast increase of domain, SA work aims not only to predict the sentiment of a sentence or document but also to give the necessary detail on different aspects of the sentence or document (i.e. aspect-based sentiment analysis). A considerable number of datasets for SA and aspect-based sentiment analysis (ABSA) have been made available for English and other well-known European languages. In this paper, we present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals. In addition, we conducted a baseline evaluation with a focus on deep learning model, achieved an accuracy of 78.75% for aspect term extraction and accuracy of 71.08% for sentiment classification. Experiments on the BAN-ABSA dataset show that the CNN model is better in terms of accuracy though Bi-LSTM significantly outperforms CNN model in terms of average F1-score.

Related papers

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT [1.5020330976600738]
We develop a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a novel rule based algorithm Bangla Sentiment Polarity Score ( BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories.
arXiv Detail & Related papers (2024-11-29T09:57:11Z)
Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B. We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively. We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z)
SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL) SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z)
Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language. The proposed framework can explain specific predictions by training a local surrogate explainable model. We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z)
BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews [1.869097450593631]
We present a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features.
arXiv Detail & Related papers (2023-05-11T06:27:38Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Arabic aspect based sentiment analysis using BERT [0.0]
This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT. We are building a simple but effective BERT-based neural baseline to handle this task. Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results.
arXiv Detail & Related papers (2021-07-28T11:34:00Z)
Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis [0.0]
Previous research on Aspect-based Sentiment Analysis (ABSA) for Indonesian reviews in hotel domain has been conducted using CNN and XGBoost. In this paper, we intend to incorporate one of the foremost language representation model, BERT, to perform ABSA in Indonesian reviews dataset.
arXiv Detail & Related papers (2021-03-05T15:05:51Z)
Sentiment analysis in Bengali via transfer learning using multi-lingual BERT [0.9883261192383611]
In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali. We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning. This deep learning model achieves an accuracy of 71% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68%.
arXiv Detail & Related papers (2020-12-03T10:21:11Z)
Exploiting BERT to improve aspect-based sentiment analysis performance on Persian language [0.0]
This research shows the potential of using pre-trained BERT model and taking advantage of using sentence-pair input on an ABSA task. The results indicate that employing Pars-BERT pre-trained model along with natural language inference auxiliary sentence (NLI-M) could boost the ABSA task accuracy up to 91%.
arXiv Detail & Related papers (2020-12-02T16:47:20Z)
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis [71.40586258509394]
This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA) It is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA.
arXiv Detail & Related papers (2020-10-31T02:21:43Z)
Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis [71.40390724765903]
Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text. Existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects. We generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment.
arXiv Detail & Related papers (2020-09-16T22:38:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.