Related papers: KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes

KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes

URL: http://arxiv.org/abs/2403.19335v2
Date: Tue, 9 Apr 2024 21:06:32 GMT
Title: KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes
Authors: Rustem Yeshpanov, Huseyin Atakan Varol,
Abstract summary: KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and includes numerical ratings ranging from 1 to 5. The study also pursued the automation of Kazakh sentiment classification through the development and evaluation of four machine learning models.
Score: 3.4975081145096665
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind. KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and includes numerical ratings ranging from 1 to 5, providing a quantitative representation of customer attitudes. The study also pursued the automation of Kazakh sentiment classification through the development and evaluation of four machine learning models trained for both polarity classification and score classification. Experimental analysis included evaluation of the results considering both balanced and imbalanced scenarios. The most successful model attained an F1-score of 0.81 for polarity classification and 0.39 for score classification on the test sets. The dataset and fine-tuned models are open access and available for download under the Creative Commons Attribution 4.0 International License (CC BY 4.0) through our GitHub repository.

Related papers

Fine-tuning BERT with Bidirectional LSTM for Fine-grained Movie Reviews Sentiment Analysis [0.0]
We fine-tune the pre-trained BERT model with Bidirectional LSTM (BiLSTM) to enhance both binary and fine-grained SA for movie reviews. We present our findings on binary classification as well as fine-grained classification utilizing benchmark datasets.
arXiv Detail & Related papers (2025-02-28T03:30:48Z)
Area under the ROC Curve has the Most Consistent Evaluation for Binary Classification [3.1850615666574806]
This study investigates how consistent different metrics are at evaluating models across data of different prevalence. I find that evaluation metrics that are less influenced by prevalence offer more consistent evaluation of individual models and more consistent ranking of a set of models.
arXiv Detail & Related papers (2024-08-19T17:52:38Z)
Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z)
Evaluating the Generation Capabilities of Large Chinese Language Models [27.598864484231477]
This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework. It assesses the generative capabilities of large Chinese language models across a spectrum of academic disciplines. Gscore automates the quality measurement of a model's text generation against reference standards.
arXiv Detail & Related papers (2023-08-09T09:22:56Z)
Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings [63.35165397320137]
This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4. The model rated responses to tasks within the Higher Education subject domain of macroeconomics in terms of their content and style.
arXiv Detail & Related papers (2023-08-03T12:47:17Z)
ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding [86.08738156304224]
We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks. We find that Claude outperforms ChatGPT, and GPT-4 achieves the highest average score.
arXiv Detail & Related papers (2023-05-23T16:15:31Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Evaluation of Embedding Models for Automatic Extraction and Classification of Acknowledged Entities in Scientific Documents [5.330844352905488]
The aim of the paper is to evaluate the performance of different embedding models for the task of automatic extraction and classification of acknowledged entities. The training was conducted using three default Flair NER models with two differently-sized corpora. Our model is able to recognize six entity types: funding agency, grant number, individuals, university, corporation and miscellaneous.
arXiv Detail & Related papers (2022-06-22T09:32:28Z)
KazNERD: Kazakh Named Entity Recognition Dataset [5.094176584161206]
We present the development of a dataset for Kazakh named entity recognition. The dataset was built as there is a clear need for publicly available annotated corpora in Kazakh. The resulting dataset contains 112,702 sentences and 136,333 annotations for 25 entity classes.
arXiv Detail & Related papers (2021-11-26T10:56:19Z)
Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information. We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols. We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z)
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset [4.542831770689362]
This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 91 hours of transcribed audio recordings spoken by two professional speakers. It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech applications in both academia and industry.
arXiv Detail & Related papers (2021-04-17T05:49:57Z)
What Can We Learn from Collective Human Opinions on Natural Language Inference Data? [88.90490998032429]
ChaosNLI is a dataset with a total of 464,500 annotations to study Collective HumAn OpinionS. This dataset is created by collecting 100 annotations per example for 3,113 examples in SNLI and MNLI and 1,532 examples in Abductive-NLI.
arXiv Detail & Related papers (2020-10-07T17:26:06Z)
SummEval: Re-evaluating Summarization Evaluation [169.622515287256]
We re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion. We benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics. We assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset.
arXiv Detail & Related papers (2020-07-24T16:25:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.