Related papers: Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection

Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection

URL: http://arxiv.org/abs/2509.25138v1
Date: Mon, 29 Sep 2025 17:50:32 GMT
Title: Investigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection
Authors: Ivan Vykopal, Antonia Karamolegkou, Jaroslav Kopčan, Qiwei Peng, Tomáš Javůrek, Michal Gregor, Marián Šimko,
Abstract summary: Large Language Models (LLMs) offer powerful capabilities for cross-lingual fact-checking.<n>LLMs often exhibit language bias, performing disproportionately better on high-resource languages such as English.<n>We present and inspect a novel concept - retrieval bias, when information retrieval systems tend to favor certain information over others.
Score: 4.6738956348193
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilingual Large Language Models (LLMs) offer powerful capabilities for cross-lingual fact-checking. However, these models often exhibit language bias, performing disproportionately better on high-resource languages such as English than on low-resource counterparts. We also present and inspect a novel concept - retrieval bias, when information retrieval systems tend to favor certain information over others, leaving the retrieval process skewed. In this paper, we study language and retrieval bias in the context of Previously Fact-Checked Claim Detection (PFCD). We evaluate six open-source multilingual LLMs across 20 languages using a fully multilingual prompting strategy, leveraging the AMC-16K dataset. By translating task prompts into each language, we uncover disparities in monolingual and cross-lingual performance and identify key trends based on model family, size, and prompting strategy. Our findings highlight persistent bias in LLM behavior and offer recommendations for improving equity in multilingual fact-checking. To investigate retrieval bias, we employed multilingual embedding models and look into the frequency of retrieved claims. Our analysis reveals that certain claims are retrieved disproportionately across different posts, leading to inflated retrieval performance for popular claims while under-representing less common ones.

Related papers

Language-Coupled Reinforcement Learning for Multilingual Retrieval-Augmented Generation [73.54930910609328]
We propose LcRL, a multilingual search-augmented reinforcement learning framework.<n>LcRL integrates a language-coupled Group Relative Policy Optimization into the policy and reward models.<n>We adopt the language-coupled group sampling in the rollout module to reduce knowledge bias, and regularize an auxiliary anti-consistency penalty in the reward models to mitigate the knowledge conflict.
arXiv Detail & Related papers (2026-01-21T11:32:32Z)
Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG [55.258582772528506]
We investigate whether the mixture of different document languages impacts generation and citation in unintended ways.<n>Across eight languages and six open-weight models, we find that models preferentially cite English sources when queries are in English.<n>We find that models sometimes trade-off document relevance for language preference, indicating that citation choices are not always driven by informativeness alone.
arXiv Detail & Related papers (2025-09-17T12:58:18Z)
Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches [8.127643463046516]
We examine strategies to improve the multilingual and crosslingual performance.<n>We evaluate approaches on a dataset containing posts and claims in 47 languages.<n>Most importantly, we show that crosslinguality is a setup with its own unique characteristics compared to the multilingual setup.
arXiv Detail & Related papers (2025-05-28T08:47:10Z)
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task [73.35882908048423]
Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP.<n>This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering.
arXiv Detail & Related papers (2025-04-04T17:35:43Z)
Large Language Models for Multilingual Previously Fact-Checked Claim Detection [7.086459223390658]
This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection.<n>We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings.<n>Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages.
arXiv Detail & Related papers (2025-03-04T15:56:43Z)
mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models.<n>We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance.<n>We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z)
Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness [30.00463676754559]
We introduce BordIRLines, a dataset of territorial disputes paired with retrieved Wikipedia documents, across 49 languages.<n>We evaluate the cross-lingual robustness of this RAG setting by formalizing several modes for multilingual retrieval.<n>Our experiments show that incorporating perspectives from diverse languages can in fact improve robustness.
arXiv Detail & Related papers (2024-10-02T01:59:07Z)
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages. It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z)
Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets) We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings. We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z)
On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks. We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments. We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z)
One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model. It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z)
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.