Related papers: SCRum-9: Multilingual Stance Classification over Rumours on Social Media

SCRum-9: Multilingual Stance Classification over Rumours on Social Media

URL: http://arxiv.org/abs/2505.18916v2
Date: Wed, 17 Sep 2025 14:42:02 GMT
Title: SCRum-9: Multilingual Stance Classification over Rumours on Social Media
Authors: Yue Li, Jake Vasilakes, Zhixue Zhao, Carolina Scarton,
Abstract summary: SCRum-9 is the largest dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X.<n>This paper innovates by exploring the use of multilingual synthetic data for stance classification.<n> SCRum-9 is publicly released to the research community with potential to foster further research on multilingual analysis of misleading narratives on social media.
Score: 15.412870757706473
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce SCRum-9, the largest multilingual Stance Classification dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages, linking examples to more fact-checked claims (2.1k), and including confidence-related annotations from multiple annotators to account for intra- and inter-annotator variability. Annotations were made by at least two native speakers per language, totalling more than 405 hours of annotation and 8,150 dollars in compensation. Further, SCRum-9 is used to benchmark five large language models (LLMs) and two multilingual masked language models (MLMs) in In-Context Learning (ICL) and fine-tuning setups. This paper also innovates by exploring the use of multilingual synthetic data for rumour stance classification, showing that even LLMs with weak ICL performance can produce valuable synthetic data for fine-tuning small MLMs, enabling them to achieve higher performance than zero-shot ICL in LLMs. Finally, we examine the relationship between model predictions and human uncertainty on ambiguous cases finding that model predictions often match the second-choice labels assigned by annotators, rather than diverging entirely from human judgments. SCRum-9 is publicly released to the research community with potential to foster further research on multilingual analysis of misleading narratives on social media.

Related papers

When Scale Meets Diversity: Evaluating Language Models on Fine-Grained Multilingual Claim Verification [14.187153195380668]
Large language models have remarkable capabilities across many NLP tasks, but their effectiveness for multilingual claim verification with nuanced classification schemes remains understudied.<n>We evaluate five state-of-the-art language models on the X-Fact dataset, which spans 25 languages with seven distinct veracity categories.<n>Surprisingly, we find that XLM-R substantially outperforms all tested LLMs, achieving 57.7% macro-F1 compared to the best LLM performance of 16.9%.
arXiv Detail & Related papers (2025-07-28T10:49:04Z)
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters [53.59868121093848]
We introduce Seed-X, a family of open-source language models (LLMs) with 7B parameter size.<n>The base model is pre-trained on a diverse, high-quality dataset encompassing both monolingual and bilingual content across 28 languages.<n>The instruct model is then finetuned to translate by Chain-of-Thought (CoT) reasoning and further enhanced through reinforcement learning (RL) to achieve better generalization across diverse language pairs.
arXiv Detail & Related papers (2025-07-18T03:19:43Z)
Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval [49.1574468325115]
We introduce Amharic-specific dense retrieval models based on pre-trained Amharic BERT and RoBERTa backbones.<n>Our proposed RoBERTa-Base-Amharic-Embed model (110M parameters) achieves a 17.6% relative improvement in MRR@10.<n>More compact variants, such as RoBERTa-Medium-Amharic-Embed (42M) remain competitive while being over 13x smaller.
arXiv Detail & Related papers (2025-05-25T23:06:20Z)
A Multi-Task Benchmark for Abusive Language Detection in Low-Resource Settings [8.361945776819528]
This work presents a large-scale human-annotated benchmark dataset for abusive language detection in Tigrinya social media.<n>The dataset comprises 13,717 YouTube comments annotated by nine native speakers, collected from 7,373 videos with a total of over 1.2 billion views across 51 channels.<n>Our experiments reveal that small, specialized multi-task models outperform the current frontier models in the low-resource setting.
arXiv Detail & Related papers (2025-05-17T18:52:47Z)
Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models [10.231866835957538]
Chain-of-thought (CoT) has emerged as a critical mechanism for enhancing reasoning capabilities in large language models (LLMs)<n>We propose the Cross-Lingual Consistency (CLC) framework, which integrates multilingual reasoning paths through majority voting to elevate LLMs' reasoning capabilities.<n> Empirical evaluations on the CMATH dataset reveal CLC's superiority over the conventional self-consistency method.
arXiv Detail & Related papers (2025-04-02T16:09:39Z)
"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation [90.09260023184932]
Retrieval-Augmented Generation (RAG) grounds Large Language Model (LLM) output by leveraging external knowledge sources to reduce factual hallucinations. NoMIRACL is a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
arXiv Detail & Related papers (2023-12-18T17:18:04Z)
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages. By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z)
The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64 Languages [17.055109973224265]
We present SPARROW, an extensive benchmark specifically designed for cross-lingual sociopragmatic meaning (SM) understanding. SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition) We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning.
arXiv Detail & Related papers (2023-10-23T04:22:44Z)
Text Classification via Large Language Models [63.1874290788797]
We introduce Clue And Reasoning Prompting (CARP) to address complex linguistic phenomena involved in text classification. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks. More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups.
arXiv Detail & Related papers (2023-05-15T06:24:45Z)
SLING: Sino Linguistic Evaluation of Large Language Models [34.42512869432145]
Sino LINGuistics (SLING) consists of 38K minimal sentence pairs in Mandarin Chinese grouped into 9 high-level linguistic phenomena. We test 18 publicly available pretrained monolingual (e.g., BERT-base-zh) and multi-lingual (e.g., mT5, XLM) language models on SLING. Our experiments show that the average accuracy for LMs is far below human performance (69.7% vs. 97.1%), while BERT-base-zh achieves the highest accuracy (84.8%) of all tested LMs, even much larger ones.
arXiv Detail & Related papers (2022-10-21T02:29:39Z)
A Multi-level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference [54.678516076366506]
Natural Language Inference (NLI) is a growingly essential task in natural language understanding. Here we propose a multi-level supervised contrastive learning framework named MultiSCL for low-resource natural language inference.
arXiv Detail & Related papers (2022-05-31T05:54:18Z)
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings. We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z)
Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages. Our largest model sets new state of the art in few-shot learning in more than 20 representative languages. We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z)
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z)
A Bayesian Multilingual Document Model for Zero-shot Topic Identification and Discovery [1.9215779751499527]
The model is an extension of BaySMM [Kesiraju et al 2020] to the multilingual scenario. We propagate the learned uncertainties through linear classifiers that benefit zero-shot cross-lingual topic identification. We revisit cross-lingual topic identification in zero-shot settings by taking a deeper dive into current datasets.
arXiv Detail & Related papers (2020-07-02T19:55:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.