Related papers: Lost in Translation -- Multilingual Misinformation and its Evolution

Lost in Translation -- Multilingual Misinformation and its Evolution

URL: http://arxiv.org/abs/2310.18089v1
Date: Fri, 27 Oct 2023 12:21:55 GMT
Title: Lost in Translation -- Multilingual Misinformation and its Evolution
Authors: Dorian Quelle, Calvin Cheng, Alexandre Bovet, Scott A. Hale
Abstract summary: This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages. We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
Score: 52.07628580627591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Misinformation and disinformation are growing threats in the digital age, spreading rapidly across languages and borders. This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages. First, we find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers. However, spreading patterns exhibit strong homophily, with misinformation more likely to spread within the same language. To study the evolution of claims over time and mutations across languages, we represent fact-checks with multilingual sentence embeddings and cluster semantically similar claims. We analyze the connected components and shortest paths connecting different versions of a claim finding that claims gradually drift over time and undergo greater alteration when traversing languages. Overall, this novel investigation of multilingual misinformation provides key insights. It quantifies redundant fact-checking efforts, establishes that some claims diffuse across languages, measures linguistic homophily, and models the temporal and cross-lingual evolution of claims. The findings advocate for expanded information sharing between fact-checkers globally while underscoring the importance of localized verification.

Related papers

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training [57.230355403478995]
We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM.<n>We find that shared concept spaces emerge early and continue to refine, but that alignment with them is language-dependent.<n>In contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior.
arXiv Detail & Related papers (2026-01-30T11:23:01Z)
Rethinking Cross-lingual Gaps from a Statistical Viewpoint [24.938886584535926]
Large Language Models (LLMs) act as a bridge by acquiring knowledge from a source language and making it accessible when queried from target languages.<n>Prior research has pointed to a cross-lingual gap, viz., a drop in accuracy when the knowledge is queried in a target language compared to when the query is in the source language.<n>We take an alternative view and hypothesize that the variance of responses in the target language is the main cause of this gap.
arXiv Detail & Related papers (2025-10-17T11:34:04Z)
Tracing Multilingual Factual Knowledge Acquisition in Pretraining [62.95057983661562]
Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data.<n>We trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B.<n>We find that both accuracy and consistency improve over time for most languages.
arXiv Detail & Related papers (2025-05-20T18:39:56Z)
Lost in Multilinguality: Dissecting Cross-lingual Factual Inconsistency in Transformer Language Models [49.16690802656554]
We find that Multilingual factual models struggle to provide consistent responses to semantically equivalent prompts in different languages. We propose a linear shortcut method that bypasses computations in the final layers, enhancing both prediction accuracy and cross-lingual consistency.
arXiv Detail & Related papers (2025-04-05T19:43:10Z)
Large Language Models for Multilingual Previously Fact-Checked Claim Detection [3.694429692322632]
This paper presents the first comprehensive evaluation of large language models (LLMs) for multilingual previously fact-checked claim detection. We assess seven LLMs across 20 languages in both monolingual and cross-lingual settings. Our results show that while LLMs perform well for high-resource languages, they struggle with low-resource languages.
arXiv Detail & Related papers (2025-03-04T15:56:43Z)
A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification [1.566834021297545]
This study systematically evaluates translation bias and the effectiveness of Large Language Models for cross-lingual claim verification. We investigate two distinct translation methods: pre-translation and self-translation. Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation.
arXiv Detail & Related papers (2024-10-14T09:02:42Z)
Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level. We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z)
Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research [7.242609314791262]
We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity. We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
arXiv Detail & Related papers (2024-01-22T14:17:03Z)
Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages. It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z)
Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem. This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets) MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts. We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z)
CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages. We present the first fact-checking framework augmented with crosslingual retrieval. We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z)
Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets) We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings. We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z)
Multilingual Evidence Retrieval and Fact Verification to Combat Global Disinformation: The Power of Polyglotism [0.0]
This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation. The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages.
arXiv Detail & Related papers (2020-12-16T13:10:56Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.