Related papers: CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text

CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text

URL: http://arxiv.org/abs/2507.07539v1
Date: Thu, 10 Jul 2025 08:35:05 GMT
Title: CEA-LIST at CheckThat! 2025: Evaluating LLMs as Detectors of Bias and Opinion in Text
Authors: Akram Elbouanani, Evan Dufraisse, Aboubacar Tuo, Adrian Popescu,
Abstract summary: This paper presents a competitive approach to multilingual subjectivity detection using large language models (LLMs) with few-shot prompting.<n>We show that LLMs, when paired with carefully designed prompts, can match or outperform fine-tuned smaller language models (SLMs)<n>Our system achieved top rankings across multiple languages in the CheckThat! 2025 subjectivity detection task.
Score: 3.9845507207125967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a competitive approach to multilingual subjectivity detection using large language models (LLMs) with few-shot prompting. We participated in Task 1: Subjectivity of the CheckThat! 2025 evaluation campaign. We show that LLMs, when paired with carefully designed prompts, can match or outperform fine-tuned smaller language models (SLMs), particularly in noisy or low-quality data settings. Despite experimenting with advanced prompt engineering techniques, such as debating LLMs and various example selection strategies, we found limited benefit beyond well-crafted standard few-shot prompts. Our system achieved top rankings across multiple languages in the CheckThat! 2025 subjectivity detection task, including first place in Arabic and Polish, and top-four finishes in Italian, English, German, and multilingual tracks. Notably, our method proved especially robust on the Arabic dataset, likely due to its resilience to annotation inconsistencies. These findings highlight the effectiveness and adaptability of LLM-based few-shot learning for multilingual sentiment tasks, offering a strong alternative to traditional fine-tuning, particularly when labeled data is scarce or inconsistent.

Related papers

Checklist Engineering Empowers Multilingual LLM Judges [12.64438771302935]
Checklist Engineering based LLM-as-a-Judge (CE-Judge) is a training-free framework that uses checklist intuition for multilingual evaluation with an open-source model.<n>Our method generally surpasses the baselines and performs on par with the GPT-4o model.
arXiv Detail & Related papers (2025-07-09T12:03:06Z)
mSTEB: Massively Multilingual Evaluation of LLMs on Speech and Text Tasks [11.996399504336624]
We introduce mSTEB, a new benchmark to evaluate the performance of large language models (LLMs) on a wide range of tasks.<n>We evaluate the performance of leading LLMs such as Gemini 2.0 Flash and GPT-4o (Audio) and state-of-the-art open models such as Qwen 2 Audio and Gemma 3 27B.
arXiv Detail & Related papers (2025-06-10T03:15:08Z)
Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches [5.850200023135349]
We examine strategies to improve the multilingual and crosslingual performance.<n>We evaluate approaches on a dataset containing posts and claims in 47 languages.<n>Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
arXiv Detail & Related papers (2025-05-28T08:47:10Z)
Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z)
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering [73.73820209993515]
We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs)<n>Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability.<n>Results show significant performance differences between the two domains.
arXiv Detail & Related papers (2025-05-22T12:27:02Z)
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language [16.21019515431378]
We propose MUG-Eval, a novel framework that evaluates large language models' multilingual generation capabilities.<n>We transform existing benchmarks into conversational tasks and measure the LLMs' accuracies on those tasks.<n>We evaluate 8 LLMs across 30 languages spanning high, mid, and low-resource categories, and we find that MUG-Eval correlates strongly with established benchmarks.
arXiv Detail & Related papers (2025-05-20T14:14:00Z)
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts [79.84059473102778]
PolyMath is a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels.<n>Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation.
arXiv Detail & Related papers (2025-04-25T15:39:04Z)
Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z)
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages. Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba) We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.