WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
- URL: http://arxiv.org/abs/2505.24195v2
- Date: Wed, 04 Jun 2025 19:04:56 GMT
- Title: WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions
- Authors: Zining Wang, Yuxuan Zhang, Dongwook Yoon, Nicholas Vincent, Farhan Samir, Vered Shwartz,
- Abstract summary: We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface.<n>Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia.
- Score: 31.58588164648108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With more than 11 times as many pageviews as the next, English Wikipedia dominates global knowledge access relative to other language editions. Readers are prone to assuming English Wikipedia as a superset of all language editions, leading many to prefer it even when their primary language is not English. Other language editions, however, comprise complementary facts rooted in their respective cultures and media environments, which are marginalized in English Wikipedia. While Wikipedia's user interface enables switching between language editions through its Interlanguage Link (ILL) system, it does not reveal to readers that other language editions contain valuable, complementary information. We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface. Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia. In a mixed-methods study (n=21), WikiGap significantly improved fact-finding accuracy, reduced task time, and received a 32-point higher usability score relative to Wikipedia's current ILL-based navigation system. Participants reported increased awareness of the availability of complementary information in non-English editions and reconsidered the completeness of English Wikipedia. WikiGap thus paves the way for improved epistemic equity across language editions.
Related papers
- Factual Inconsistencies in Multilingual Wikipedia Tables [5.395647076142643]
This study investigates cross-lingual inconsistencies in Wikipedia's structured content.<n>We develop a methodology to collect, align, and analyze tables from Wikipedia multilingual articles.<n>These insights have implications for factual verification, multilingual knowledge interaction, and design for reliable AI systems.
arXiv Detail & Related papers (2025-07-24T13:46:14Z) - Web2Wiki: Characterizing Wikipedia Linking Across the Web [19.00204665059246]
We identify over 90 million Wikipedia links spanning 1.68% of Web domains.<n>Wikipedia is most frequently cited by news and science websites for informational purposes.<n>Most links serve as explanatory references rather than as evidence or attribution.
arXiv Detail & Related papers (2025-05-17T00:52:24Z) - On the effective transfer of knowledge from English to Hindi Wikipedia [4.427603894929721]
We propose a lightweight framework to enhance knowledge equity between English and Hindi.<n>In case the English Wikipedia page is not up-to-date, our framework adapts it to align with Wikipedia's distinctive style.<n>Our framework effectively generates new content for Hindi Wikipedia sections, enhancing Hindi Wikipedia articles respectively by 65% and 62% according to automatic and human judgment-based evaluations.
arXiv Detail & Related papers (2024-12-07T17:43:21Z) - Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia [49.80565462746646]
We introduce the InfoGap method -- an efficient and reliable approach to locating information gaps and inconsistencies in articles at the fact level.
We evaluate InfoGap by analyzing LGBT people's portrayals, across 2.7K biography pages on English, Russian, and French Wikipedias.
arXiv Detail & Related papers (2024-10-05T20:40:49Z) - An Open Multilingual System for Scoring Readability of Wikipedia [3.992677070507323]
We develop a multilingual model to score the readability of Wikipedia articles.
We create a novel multilingual dataset spanning 14 languages, by matching articles from Wikipedia to simplified Wikipedia and online childrens.
We show that our model performs well in a zero-shot scenario, yielding a ranking accuracy of more than 80% across 14 languages.
arXiv Detail & Related papers (2024-06-03T23:07:18Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - Crosslingual Topic Modeling with WikiPDA [15.198979978589476]
We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA)
It learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics.
We show its utility in two applications: a study of topical biases in 28 Wikipedia editions, and crosslingual supervised classification.
arXiv Detail & Related papers (2020-09-23T15:19:27Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z) - Architecture for a multilingual Wikipedia [0.0]
We argue that we need a new approach to tackle this problem more effectively.
This paper proposes an architecture for a system that fulfills this goal.
It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language.
arXiv Detail & Related papers (2020-04-08T22:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.