Sentiment Classification in Bangla Textual Content: A Comparative Study
- URL: http://arxiv.org/abs/2011.10106v1
- Date: Thu, 19 Nov 2020 21:06:28 GMT
- Title: Sentiment Classification in Bangla Textual Content: A Comparative Study
- Authors: Md. Arid Hasan, Jannatul Tajrin, Shammur Absar Chowdhury, Firoj Alam
- Abstract summary: In this study, we explore several publicly available sentiment labeled datasets and designed classifiers using both classical and deep learning algorithms.
Our finding suggests transformer-based models, which have not been explored earlier for Bangla, outperform all other models.
- Score: 4.2394281761764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment analysis has been widely used to understand our views on social and
political agendas or user experiences over a product. It is one of the cores
and well-researched areas in NLP. However, for low-resource languages, like
Bangla, one of the prominent challenge is the lack of resources. Another
important limitation, in the current literature for Bangla, is the absence of
comparable results due to the lack of a well-defined train/test split. In this
study, we explore several publicly available sentiment labeled datasets and
designed classifiers using both classical and deep learning algorithms. In our
study, the classical algorithms include SVM and Random Forest, and deep
learning algorithms include CNN, FastText, and transformer-based models. We
compare these models in terms of model performance and time-resource
complexity. Our finding suggests transformer-based models, which have not been
explored earlier for Bangla, outperform all other models. Furthermore, we
created a weighted list of lexicon content based on the valence score per
class. We then analyzed the content for high significance entries per class, in
the datasets. For reproducibility, we make publicly available data splits and
the ranked lexicon list. The presented results can be used for future studies
as a benchmark.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from
Book Reviews [1.869097450593631]
We present a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral.
We employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT.
Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features.
arXiv Detail & Related papers (2023-05-11T06:27:38Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - A Large Scale Search Dataset for Unbiased Learning to Rank [51.97967284268577]
We introduce the Baidu-ULTR dataset for unbiased learning to rank.
It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries.
It provides: (1) the original semantic feature and a pre-trained language model for easy usage; (2) sufficient display information such as position, displayed height, and displayed abstract; and (3) rich user feedback on search result pages (SERPs) like dwelling time.
arXiv Detail & Related papers (2022-07-07T02:37:25Z) - Empirical evaluation of shallow and deep learning classifiers for Arabic
sentiment analysis [1.1172382217477126]
This work presents a detailed comparison of the performance of deep learning models for sentiment analysis of Arabic reviews.
The datasets used in this study are multi-dialect Arabic hotel and book review datasets, which are some of the largest publicly available datasets for Arabic reviews.
Results showed deep learning outperforming shallow learning for binary and multi-label classification, in contrast with the results of similar work reported in the literature.
arXiv Detail & Related papers (2021-12-01T14:45:43Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Bangla Text Classification using Transformers [2.3475904942266697]
Text classification has been one of the earliest problems in NLP.
In this work, we fine-tune multilingual Transformer models for Bangla text classification tasks.
We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.
arXiv Detail & Related papers (2020-11-09T14:12:07Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.