Lost in Translation -- Multilingual Misinformation and its Evolution
- URL: http://arxiv.org/abs/2310.18089v1
- Date: Fri, 27 Oct 2023 12:21:55 GMT
- Title: Lost in Translation -- Multilingual Misinformation and its Evolution
- Authors: Dorian Quelle, Calvin Cheng, Alexandre Bovet, Scott A. Hale
- Abstract summary: This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
- Score: 52.07628580627591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Misinformation and disinformation are growing threats in the digital age,
spreading rapidly across languages and borders. This paper investigates the
prevalence and dynamics of multilingual misinformation through an analysis of
over 250,000 unique fact-checks spanning 95 languages. First, we find that
while the majority of misinformation claims are only fact-checked once, 11.7%,
corresponding to more than 21,000 claims, are checked multiple times. Using
fact-checks as a proxy for the spread of misinformation, we find 33% of
repeated claims cross linguistic boundaries, suggesting that some
misinformation permeates language barriers. However, spreading patterns exhibit
strong homophily, with misinformation more likely to spread within the same
language. To study the evolution of claims over time and mutations across
languages, we represent fact-checks with multilingual sentence embeddings and
cluster semantically similar claims. We analyze the connected components and
shortest paths connecting different versions of a claim finding that claims
gradually drift over time and undergo greater alteration when traversing
languages. Overall, this novel investigation of multilingual misinformation
provides key insights. It quantifies redundant fact-checking efforts,
establishes that some claims diffuse across languages, measures linguistic
homophily, and models the temporal and cross-lingual evolution of claims. The
findings advocate for expanded information sharing between fact-checkers
globally while underscoring the importance of localized verification.
Related papers
- A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification [1.566834021297545]
This study systematically evaluates translation bias and the effectiveness of Large Language Models for cross-lingual claim verification.
We investigate two distinct translation methods: pre-translation and self-translation.
Our findings reveal that low-resource languages exhibit significantly lower accuracy in direct inference due to underrepresentation.
arXiv Detail & Related papers (2024-10-14T09:02:42Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research [7.242609314791262]
We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity.
We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
arXiv Detail & Related papers (2024-01-22T14:17:03Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem.
This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets)
MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts.
We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - Multilingual Evidence Retrieval and Fact Verification to Combat Global
Disinformation: The Power of Polyglotism [0.0]
This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation.
The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages.
arXiv Detail & Related papers (2020-12-16T13:10:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.