BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and
it's baseline evaluation
- URL: http://arxiv.org/abs/2012.00288v1
- Date: Tue, 1 Dec 2020 06:09:44 GMT
- Title: BAN-ABSA: An Aspect-Based Sentiment Analysis dataset for Bengali and
it's baseline evaluation
- Authors: Mahfuz Ahmed Masum, Sheikh Junayed Ahmed, Ayesha Tasnim, Md Saiful
Islam
- Abstract summary: We present a manually annotated Bengali dataset of high quality, BAN-ABSA, which is annotated with aspect and its associated sentiment by 3 native Bengali speakers.
The dataset consists of 2,619 positive, 4,721 negative and 1,669 neutral data samples from 9,009 unique comments gathered from some famous Bengali news portals.
- Score: 0.8793721044482612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the breathtaking growth of social media or newspaper user comments,
online product reviews comments, sentiment analysis (SA) has captured
substantial interest from the researchers. With the fast increase of domain, SA
work aims not only to predict the sentiment of a sentence or document but also
to give the necessary detail on different aspects of the sentence or document
(i.e. aspect-based sentiment analysis). A considerable number of datasets for
SA and aspect-based sentiment analysis (ABSA) have been made available for
English and other well-known European languages. In this paper, we present a
manually annotated Bengali dataset of high quality, BAN-ABSA, which is
annotated with aspect and its associated sentiment by 3 native Bengali
speakers. The dataset consists of 2,619 positive, 4,721 negative and 1,669
neutral data samples from 9,009 unique comments gathered from some famous
Bengali news portals. In addition, we conducted a baseline evaluation with a
focus on deep learning model, achieved an accuracy of 78.75% for aspect term
extraction and accuracy of 71.08% for sentiment classification. Experiments on
the BAN-ABSA dataset show that the CNN model is better in terms of accuracy
though Bi-LSTM significantly outperforms CNN model in terms of average
F1-score.
Related papers
- Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - SOUL: Towards Sentiment and Opinion Understanding of Language [96.74878032417054]
We propose a new task called Sentiment and Opinion Understanding of Language (SOUL)
SOUL aims to evaluate sentiment understanding through two subtasks: Review (RC) and Justification Generation (JG)
arXiv Detail & Related papers (2023-10-27T06:48:48Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from
Book Reviews [1.869097450593631]
We present a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral.
We employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT.
Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features.
arXiv Detail & Related papers (2023-05-11T06:27:38Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - Arabic aspect based sentiment analysis using BERT [0.0]
This article explores the modeling capabilities of contextual embeddings from pre-trained language models, such as BERT.
We are building a simple but effective BERT-based neural baseline to handle this task.
Our BERT architecture with a simple linear classification layer surpassed the state-of-the-art works, according to the experimental results.
arXiv Detail & Related papers (2021-07-28T11:34:00Z) - Fine-tuning Pretrained Multilingual BERT Model for Indonesian
Aspect-based Sentiment Analysis [0.0]
Previous research on Aspect-based Sentiment Analysis (ABSA) for Indonesian reviews in hotel domain has been conducted using CNN and XGBoost.
In this paper, we intend to incorporate one of the foremost language representation model, BERT, to perform ABSA in Indonesian reviews dataset.
arXiv Detail & Related papers (2021-03-05T15:05:51Z) - Sentiment analysis in Bengali via transfer learning using multi-lingual
BERT [0.9883261192383611]
In this paper, we present manually tagged 2-class and 3-class SA datasets in Bengali.
We also demonstrate that the multi-lingual BERT model with relevant extensions can be trained via the approach of transfer learning.
This deep learning model achieves an accuracy of 71% for 2-class sentiment classification compared to the current state-of-the-art accuracy of 68%.
arXiv Detail & Related papers (2020-12-03T10:21:11Z) - Exploiting BERT to improve aspect-based sentiment analysis performance
on Persian language [0.0]
This research shows the potential of using pre-trained BERT model and taking advantage of using sentence-pair input on an ABSA task.
The results indicate that employing Pars-BERT pre-trained model along with natural language inference auxiliary sentence (NLI-M) could boost the ABSA task accuracy up to 91%.
arXiv Detail & Related papers (2020-12-02T16:47:20Z) - Understanding Pre-trained BERT for Aspect-based Sentiment Analysis [71.40586258509394]
This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA)
It is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA.
arXiv Detail & Related papers (2020-10-31T02:21:43Z) - Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based
Sentiment Analysis [71.40390724765903]
Aspect-based sentiment analysis (ABSA) aims to predict the sentiment towards a specific aspect in the text.
Existing ABSA test sets cannot be used to probe whether a model can distinguish the sentiment of the target aspect from the non-target aspects.
We generate new examples to disentangle the confounding sentiments of the non-target aspects from the target aspect's sentiment.
arXiv Detail & Related papers (2020-09-16T22:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.