LASTIST: LArge-Scale Target-Independent STance dataset
- URL: http://arxiv.org/abs/2510.25783v1
- Date: Tue, 28 Oct 2025 11:07:29 GMT
- Title: LASTIST: LArge-Scale Target-Independent STance dataset
- Authors: DongJae Kim, Yaejin Lee, Minsu Park, Eunil Park,
- Abstract summary: Stance detection has emerged as an area of research in the field of artificial intelligence.<n>Most research is currently centered on the target-dependent stance detection task.<n>Most benchmark datasets are based on English, making it difficult to develop models in low-resource languages such as Korean.<n>This study proposes the LArge-Scale Target-Independent STance (LASTIST) dataset to fill this research gap.
- Score: 16.439668986979353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stance detection has emerged as an area of research in the field of artificial intelligence. However, most research is currently centered on the target-dependent stance detection task, which is based on a person's stance in favor of or against a specific target. Furthermore, most benchmark datasets are based on English, making it difficult to develop models in low-resource languages such as Korean, especially for an emerging field such as stance detection. This study proposes the LArge-Scale Target-Independent STance (LASTIST) dataset to fill this research gap. Collected from the press releases of both parties on Korean political parties, the LASTIST dataset uses 563,299 labeled Korean sentences. We provide a detailed description of how we collected and constructed the dataset and trained state-of-the-art deep learning and stance detection models. Our LASTIST dataset is designed for various tasks in stance detection, including target-independent stance detection and diachronic evolution stance detection. We deploy our dataset on https://anonymous.4open.science/r/LASTIST-3721/.
Related papers
- Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models [19.85460397012729]
Agency expected of Agentic Large Language Models goes beyond answering correctly, requiring autonomy to set goals and decide what to explore.<n>We term this investigatory intelligence, distinguishing it from executional intelligence, which merely completes assigned tasks.<n>To address this, we introduce Deep Data Research ( DDR), an open-ended task where LLMs autonomously extract key insights from databases, and DDR-Bench, a large-scale, checklist-based benchmark that enables verifiable evaluation.
arXiv Detail & Related papers (2026-02-02T12:36:57Z) - Zero-Shot Conversational Stance Detection: Dataset and Approaches [24.892337124161983]
Stance detection aims to identify public opinion towards specific targets using social media data.<n>We manually curate a large-scale, high-quality zero-shot conversational stance detection dataset, named ZS-CSD.<n>We propose SITPCL, a speaker interaction and target-aware prototypical contrastive learning model, and establish the benchmark performance in the zero-shot setting.
arXiv Detail & Related papers (2025-06-21T12:02:06Z) - Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning [51.170479006249195]
We introduce a new dataset, benchmark, and a dynamic coarse-to-fine learning scheme in this study.<n>Our proposed dataset, AI-TOD-R, features the smallest object sizes among all oriented object detection datasets.<n>We present a benchmark spanning a broad range of detection paradigms, including both fully-supervised and label-efficient approaches.
arXiv Detail & Related papers (2024-12-16T09:14:32Z) - A Challenge Dataset and Effective Models for Conversational Stance Detection [26.208989232347058]
We introduce a new multi-turn conversation stance detection dataset (called textbfMT-CSD)
We propose a global-local attention network (textbfGLAN) to address both long and short-range dependencies inherent in conversational data.
Our dataset serves as a valuable resource to catalyze advancements in cross-domain stance detection.
arXiv Detail & Related papers (2024-03-17T08:51:01Z) - How to Solve Few-Shot Abusive Content Detection Using the Data We Actually Have [58.23138483086277]
In this work we leverage datasets we already have, covering a wide range of tasks related to abusive language detection.
Our goal is to build models cheaply for a new target label set and/or language, using only a few training examples of the target domain.
Our experiments show that using already existing datasets and only a few-shots of the target task the performance of models improve both monolingually and across languages.
arXiv Detail & Related papers (2023-05-23T14:04:12Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Data Selection for Language Models via Importance Resampling [90.9263039747723]
We formalize the problem of selecting a subset of a large raw unlabeled dataset to match a desired target distribution.
We extend the classic importance resampling approach used in low-dimensions for LM data selection.
We instantiate the DSIR framework with hashed n-gram features for efficiency, enabling the selection of 100M documents in 4.5 hours.
arXiv Detail & Related papers (2023-02-06T23:57:56Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - X-Stance: A Multilingual Multi-Target Dataset for Stance Detection [42.46681912294797]
We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland.
The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection.
arXiv Detail & Related papers (2020-03-18T17:58:10Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.