A Comparative Study of Reference Reliability in Multiple Language
Editions of Wikipedia
- URL: http://arxiv.org/abs/2309.00196v2
- Date: Mon, 4 Sep 2023 10:25:48 GMT
- Title: A Comparative Study of Reference Reliability in Multiple Language
Editions of Wikipedia
- Authors: Aitolkyn Baigutanova, Diego Saez-Trumper, Miriam Redi, Meeyoung Cha,
Pablo Arag\'on
- Abstract summary: This study examines over 5 million Wikipedia articles to assess the reliability of references in multiple language editions.
Some sources deemed untrustworthy in one language (i.e., English) continue to appear in articles in other languages.
Non-authoritative sources found in the English version of a page tend to persist in other language versions of that page.
- Score: 12.919146538916353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information presented in Wikipedia articles must be attributable to reliable
published sources in the form of references. This study examines over 5 million
Wikipedia articles to assess the reliability of references in multiple language
editions. We quantify the cross-lingual patterns of the perennial sources list,
a collection of reliability labels for web domains identified and
collaboratively agreed upon by Wikipedia editors. We discover that some sources
(or web domains) deemed untrustworthy in one language (i.e., English) continue
to appear in articles in other languages. This trend is especially evident with
sources tailored for smaller communities. Furthermore, non-authoritative
sources found in the English version of a page tend to persist in other
language versions of that page. We finally present a case study on the Chinese,
Russian, and Swedish Wikipedias to demonstrate a discrepancy in reference
reliability across cultures. Our finding highlights future challenges in
coordinating global knowledge on source reliability.
Related papers
- Language-Agnostic Modeling of Source Reliability on Wikipedia [2.6474867060112346]
We present a language-agnostic model designed to assess the reliability of sources across multiple language editions of Wikipedia.
The model effectively predicts source reliability, achieving an F1 Macro score of approximately 0.80 for English.
We highlight the challenge of maintaining consistent model performance across languages of varying resource levels.
arXiv Detail & Related papers (2024-10-24T14:52:21Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - An Open Multilingual System for Scoring Readability of Wikipedia [3.992677070507323]
We develop a multilingual model to score the readability of Wikipedia articles.
We create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online childrens.
We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages.
arXiv Detail & Related papers (2024-06-03T23:07:18Z) - Lost in Translation -- Multilingual Misinformation and its Evolution [52.07628580627591]
This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages.
We find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times.
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries.
arXiv Detail & Related papers (2023-10-27T12:21:55Z) - Longitudinal Assessment of Reference Quality on Wikipedia [7.823541290904653]
This work analyzes the reliability of this global encyclopedia through the lens of its references.
We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references.
arXiv Detail & Related papers (2023-03-09T13:04:14Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z) - Quantifying Engagement with Citations on Wikipedia [13.703047949952852]
One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
arXiv Detail & Related papers (2020-01-23T15:52:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.