Czech Dataset for Complex Aspect-Based Sentiment Analysis Tasks
- URL: http://arxiv.org/abs/2508.08125v1
- Date: Mon, 11 Aug 2025 16:03:28 GMT
- Title: Czech Dataset for Complex Aspect-Based Sentiment Analysis Tasks
- Authors: Jakub Šmíd, Pavel Přibáň, Ondřej Pražák, Pavel Král,
- Abstract summary: This paper introduces a novel dataset for aspect-based sentiment analysis (ABSA)<n>It consists of 3.1K manually annotated reviews from the restaurant domain.<n>We provide 24M reviews without annotations suitable for unsupervised learning.
- Score: 0.7874708385247352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce a novel Czech dataset for aspect-based sentiment analysis (ABSA), which consists of 3.1K manually annotated reviews from the restaurant domain. The dataset is built upon the older Czech dataset, which contained only separate labels for the basic ABSA tasks such as aspect term extraction or aspect polarity detection. Unlike its predecessor, our new dataset is specifically designed for more complex tasks, e.g. target-aspect-category detection. These advanced tasks require a unified annotation format, seamlessly linking sentiment elements (labels) together. Our dataset follows the format of the well-known SemEval-2016 datasets. This design choice allows effortless application and evaluation in cross-lingual scenarios, ultimately fostering cross-language comparisons with equivalent counterpart datasets in other languages. The annotation process engaged two trained annotators, yielding an impressive inter-annotator agreement rate of approximately 90%. Additionally, we provide 24M reviews without annotations suitable for unsupervised learning. We present robust monolingual baseline results achieved with various Transformer-based models and insightful error analysis to supplement our contributions. Our code and dataset are freely available for non-commercial research purposes.
Related papers
- Extending Czech Aspect-Based Sentiment Analysis with Opinion Terms: Dataset and LLM Benchmarks [1.9779500088459443]
This paper introduces a novel Czech dataset in the restaurant domain for aspect-based sentiment analysis (ABSA)<n>We conduct extensive experiments using modern Transformer-based models, including large language models (LLMs) in monolingual, cross-lingual, and multilingual settings.<n>A detailed error analysis reveals key challenges, including the detection of subtle opinion terms and nuanced sentiment expressions.
arXiv Detail & Related papers (2026-02-26T08:13:42Z) - Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z) - An Aspect Extraction Framework using Different Embedding Types, Learning Models, and Dependency Structure [0.0657714808721181]
An important component of aspect-based sentiment analysis is aspect extraction, which involves identifying and extracting aspect terms from text.<n>In this paper, we propose aspect extraction models that use different types of embeddings for words and part-of-speech tags.<n>We also propose tree positional encoding that is based on dependency parsing output to capture better the aspect positions in sentences.
arXiv Detail & Related papers (2025-03-05T13:57:48Z) - KPC-cF: Aspect-Based Sentiment Analysis via Implicit-Feature Alignment with Corpus Filtering [0.0]
Our research proposes an intuitive and effective framework for ABSA in low-resource languages such as Korean.<n>It optimize prediction labels by integrating translated benchmark and unlabeled Korean data.<n>Compared to English ABSA, our framework showed an approximately 3% difference in F1 scores and accuracy.
arXiv Detail & Related papers (2024-06-29T07:01:51Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation [64.9546787488337]
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation.
The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese.
arXiv Detail & Related papers (2022-10-01T05:02:04Z) - Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization [80.94424037751243]
In zero-shot multilingual extractive text summarization, a model is typically trained on English dataset and then applied on summarization datasets of other languages.
We propose NLS (Neural Label Search for Summarization), which jointly learns hierarchical weights for different sets of labels together with our summarization model.
We conduct multilingual zero-shot summarization experiments on MLSUM and WikiLingua datasets, and we achieve state-of-the-art results using both human and automatic evaluations.
arXiv Detail & Related papers (2022-04-28T14:02:16Z) - CL-XABSA: Contrastive Learning for Cross-lingual Aspect-based Sentiment
Analysis [4.60495447017298]
We propose a novel framework, CL-XABSA: Contrastive Learning for Cross-lingual Aspect-Based Sentiment Analysis.
Specifically, we design two contrastive strategies, token level contrastive learning of token embeddings (TL-CTE) and sentiment level contrastive learning of token embeddings (SL-CTE)
Since our framework can receive datasets in multiple languages during training, our framework can be adapted not only for XABSA task, but also for multilingual aspect-based sentiment analysis (MABSA)
arXiv Detail & Related papers (2022-04-02T07:40:03Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z) - YASO: A New Benchmark for Targeted Sentiment Analysis [12.60266470026856]
We present YASO -- a new crowd-sourced TSA evaluation dataset.
The dataset contains 2,215 English sentences from movie, business and product reviews, and 7,415 terms and their sentiments annotated within these sentences.
Our analysis verifies the reliability of our annotations, and explores the characteristics of the collected data.
arXiv Detail & Related papers (2020-12-29T00:25:15Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.