Lost in Translation -- Multilingual Misinformation and its Evolution
- URL: http://arxiv.org/abs/2310.18089v1
- Date: Fri, 27 Oct 2023 12:21:55 GMT
- Title: Lost in Translation -- Multilingual Misinformation and its Evolution
- Authors: Dorian Quelle, Calvin Cheng, Alexandre Bovet, Scott A. Hale
- Abstract summary: This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
- Score: 52.07628580627591
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Misinformation and disinformation are growing threats in the digital age,
spreading rapidly across languages and borders. This paper investigates the
prevalence and dynamics of multilingual misinformation through an analysis of
over 250,000 unique fact-checks spanning 95 languages. First, we find that
while the majority of misinformation claims are only fact-checked once, 11.7%,
corresponding to more than 21,000 claims, are checked multiple times. Using
fact-checks as a proxy for the spread of misinformation, we find 33% of
repeated claims cross linguistic boundaries, suggesting that some
misinformation permeates language barriers. However, spreading patterns exhibit
strong homophily, with misinformation more likely to spread within the same
language. To study the evolution of claims over time and mutations across
languages, we represent fact-checks with multilingual sentence embeddings and
cluster semantically similar claims. We analyze the connected components and
shortest paths connecting different versions of a claim finding that claims
gradually drift over time and undergo greater alteration when traversing
languages. Overall, this novel investigation of multilingual misinformation
provides key insights. It quantifies redundant fact-checking efforts,
establishes that some claims diffuse across languages, measures linguistic
homophily, and models the temporal and cross-lingual evolution of claims. The
findings advocate for expanded information sharing between fact-checkers
globally while underscoring the importance of localized verification.
Related papers
- Claim Detection for Automated Fact-checking: A Survey on Monolingual, Multilingual and Cross-Lingual Research [7.242609314791262]
We present state-of-the-art multilingual claim detection research categorized into three key factors of the problem, verifiability, priority, and similarity.
We present a detailed overview of the existing multilingual datasets along with the challenges and suggest possible future advancements.
arXiv Detail & Related papers (2024-01-22T14:17:03Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Multilingual Previously Fact-Checked Claim Retrieval [1.4884363206251627]
This paper introduces a new multilingual dataset -- MultiClaim -- for fact-checked claim retrieval.
We collected 28k posts in 27 languages from social media, 206k fact-checks in 39 languages written by professional fact-checkers.
We evaluated how different unsupervised methods fare on this dataset and its various dimensions.
arXiv Detail & Related papers (2023-05-13T20:00:18Z) - Cross-lingual Transfer Learning for Check-worthy Claim Identification
over Twitter [7.601937548486356]
Misinformation spread over social media has become an undeniable infodemic.
We present a systematic study of six approaches for cross-lingual check-worthiness estimation across pairs of five diverse languages with the help of Multilingual BERT (mBERT) model.
Our results show that for some language pairs, zero-shot cross-lingual transfer is possible and can perform as good as monolingual models that are trained on the target language.
arXiv Detail & Related papers (2022-11-09T18:18:53Z) - CONCRETE: Improving Cross-lingual Fact-checking with Cross-lingual
Retrieval [73.48591773882052]
Most fact-checking approaches focus on English only due to the data scarcity issue in other languages.
We present the first fact-checking framework augmented with crosslingual retrieval.
We train the retriever with our proposed Crosslingual Inverse Cloze Task (XICT)
arXiv Detail & Related papers (2022-09-05T17:36:14Z) - Matching Tweets With Applicable Fact-Checks Across Languages [27.762055254009017]
We focus on automatically finding existing fact-checks for claims made in social media posts (tweets)
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings.
We present promising results for "match" classification (93% average accuracy) in four language pairs.
arXiv Detail & Related papers (2022-02-14T23:33:02Z) - Multilingual Evidence Retrieval and Fact Verification to Combat Global
Disinformation: The Power of Polyglotism [0.0]
This article investigates multilingual evidence retrieval and fact verification as a step to combat global disinformation.
The goal is building multilingual systems that retrieve in evidence-rich languages to verify claims in evidence-poor languages.
arXiv Detail & Related papers (2020-12-16T13:10:56Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.