SentiGOLD: A Large Bangla Gold Standard Multi-Domain Sentiment Analysis
Dataset and its Evaluation
- URL: http://arxiv.org/abs/2306.06147v1
- Date: Fri, 9 Jun 2023 12:07:10 GMT
- Title: SentiGOLD: A Large Bangla Gold Standard Multi-Domain Sentiment Analysis
Dataset and its Evaluation
- Authors: Md. Ekramul Islam, Labib Chowdhury, Faisal Ahamed Khan, Shazzad
Hossain, Sourave Hossain, Mohammad Mamun Or Rashid, Nabeel Mohammed and
Mohammad Ruhul Amin
- Abstract summary: SentiGOLD adheres to established linguistic conventions agreed upon by the Government of Bangladesh and a Bangla linguistics committee.
The dataset incorporates data from online video comments, social media posts, blogs, news, and other sources while maintaining domain and class distribution rigorously.
The top model achieves a macro f1 score of 0.62 (intra-dataset) across 5 classes, setting a benchmark, and 0.61 (cross-dataset from SentNoB) across 3 classes, comparable to the state-of-the-art.
- Score: 0.9894420655516565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study introduces SentiGOLD, a Bangla multi-domain sentiment analysis
dataset. Comprising 70,000 samples, it was created from diverse sources and
annotated by a gender-balanced team of linguists. SentiGOLD adheres to
established linguistic conventions agreed upon by the Government of Bangladesh
and a Bangla linguistics committee. Unlike English and other languages, Bangla
lacks standard sentiment analysis datasets due to the absence of a national
linguistics framework. The dataset incorporates data from online video
comments, social media posts, blogs, news, and other sources while maintaining
domain and class distribution rigorously. It spans 30 domains (e.g., politics,
entertainment, sports) and includes 5 sentiment classes (strongly negative,
weakly negative, neutral, and strongly positive). The annotation scheme,
approved by the national linguistics committee, ensures a robust Inter
Annotator Agreement (IAA) with a Fleiss' kappa score of 0.88. Intra- and
cross-dataset evaluation protocols are applied to establish a standard
classification system. Cross-dataset evaluation on the noisy SentNoB dataset
presents a challenging test scenario. Additionally, zero-shot experiments
demonstrate the generalizability of SentiGOLD. The top model achieves a macro
f1 score of 0.62 (intra-dataset) across 5 classes, setting a benchmark, and
0.61 (cross-dataset from SentNoB) across 3 classes, comparable to the
state-of-the-art. Fine-tuned sentiment analysis model can be accessed at
https://sentiment.bangla.gov.bd.
Related papers
- GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains [13.598485056526771]
We present and evaluate a new benchmark for PDTB-style shallow discourse parsing based on the existing UD English GUM corpus.
In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed.
arXiv Detail & Related papers (2024-11-01T10:04:43Z) - You Shall Know a Tool by the Traces it Leaves: The Predictability of Sentiment Analysis Tools [74.98850427240464]
We show that sentiment analysis tools disagree on the same dataset.
We show that the sentiment tool used for sentiment annotation can even be predicted from its outcome.
arXiv Detail & Related papers (2024-10-18T17:27:38Z) - RuBia: A Russian Language Bias Detection Dataset [3.8501658629243076]
We present a bias detection dataset specifically designed for the Russian language, dubbed as RuBia.
The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse.
There are nearly 2,000 unique sentence pairs spread over 19 in RuBia.
arXiv Detail & Related papers (2024-03-26T10:01:01Z) - Paloma: A Benchmark for Evaluating Language Model Fit [114.63031978259467]
Language Model Assessment (Paloma) measures fit to 585 text domains.
We populate our benchmark with results from baselines pretrained on popular corpora.
arXiv Detail & Related papers (2023-12-16T19:12:45Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Constructing Colloquial Dataset for Persian Sentiment Analysis of Social
Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way.
Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram.
Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z) - BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate
Speech in Different Social Contexts [1.5483942282713241]
This paper introduces a large manually labeled dataset that includes Hate Speech in different social contexts.
The dataset includes more than 50,200 offensive comments crawled from online social networking sites.
In experiments, we found that a word embedding trained exclusively using 1.47 million comments consistently resulted in better modeling of HS detection.
arXiv Detail & Related papers (2022-06-01T10:10:15Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - TweetEval: Unified Benchmark and Comparative Evaluation for Tweet
Classification [22.265865542786084]
We propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks.
Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models.
arXiv Detail & Related papers (2020-10-23T14:11:04Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.