Characterizing Knowledge Manipulation in a Russian Wikipedia Fork
- URL: http://arxiv.org/abs/2504.10663v2
- Date: Mon, 21 Apr 2025 05:07:13 GMT
- Title: Characterizing Knowledge Manipulation in a Russian Wikipedia Fork
- Authors: Mykola Trokhymovych, Oleksandr Kosovan, Nathan Forrester, Pablo Aragón, Diego Saez-Trumper, Ricardo Baeza-Yates,
- Abstract summary: Recently launched website Ruwiki copied and modified original Russian Wikipedia content to conform to Russian law.<n>This article presents an in-depth analysis of this Russian Wikipedia fork.<n>We propose a methodology to characterize the main changes with respect to the original version.
- Score: 18.630486406259426
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Wikipedia is powered by MediaWiki, a free and open-source software that is also the infrastructure for many other wiki-based online encyclopedias. These include the recently launched website Ruwiki, which has copied and modified the original Russian Wikipedia content to conform to Russian law. To identify practices and narratives that could be associated with different forms of knowledge manipulation, this article presents an in-depth analysis of this Russian Wikipedia fork. We propose a methodology to characterize the main changes with respect to the original version. The foundation of this study is a comprehensive comparative analysis of more than 1.9M articles from Russian Wikipedia and its fork. Using meta-information and geographical, temporal, categorical, and textual features, we explore the changes made by Ruwiki editors. Furthermore, we present a classification of the main topics of knowledge manipulation in this fork, including a numerical estimation of their scope. This research not only sheds light on significant changes within Ruwiki, but also provides a methodology that could be applied to analyze other Wikipedia forks and similar collaborative projects.
Related papers
- Science Fiction and Fantasy in Wikipedia: Exploring Structural and Semantic Cues [0.0]
Identifying which Wikipedia articles are related to science fiction, fantasy, or their hybrids is challenging because genre boundaries are porous and frequently overlap.<n>This study examines structural and semantic features of Wikipedia articles that can be used to identify content related to science fiction and fantasy (SF/F)
arXiv Detail & Related papers (2026-02-27T17:56:25Z) - Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias [1.2109519547057517]
We examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content.<n>We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity.<n>Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification.
arXiv Detail & Related papers (2026-02-05T10:24:21Z) - How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison [0.0]
Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia.<n>This study undertakes a large-scale computational comparison of 1,800 matched article pairs between Grokipedia and Wikipedia.<n>Using metrics across lexical richness, readability, structural organization, reference density, and semantic similarity, we assess how closely the two platforms align in form and substance.
arXiv Detail & Related papers (2025-10-30T18:04:46Z) - Factual Inconsistencies in Multilingual Wikipedia Tables [5.395647076142643]
This study investigates cross-lingual inconsistencies in Wikipedia's structured content.<n>We develop a methodology to collect, align, and analyze tables from Wikipedia multilingual articles.<n>These insights have implications for factual verification, multilingual knowledge interaction, and design for reliable AI systems.
arXiv Detail & Related papers (2025-07-24T13:46:14Z) - WikiGap: Promoting Epistemic Equity by Surfacing Knowledge Gaps Between English Wikipedia and other Language Editions [31.58588164648108]
We present WikiGap, a system that surfaces complementary facts sourced from other Wikipedias within the English Wikipedia interface.<n>Specifically, by combining a recent multilingual information-gap discovery method with a user-centered design, WikiGap enables access to complementary information from French, Russian, and Chinese Wikipedia.
arXiv Detail & Related papers (2025-05-30T04:14:03Z) - Web2Wiki: Characterizing Wikipedia Linking Across the Web [19.00204665059246]
We identify over 90 million Wikipedia links spanning 1.68% of Web domains.<n>Wikipedia is most frequently cited by news and science websites for informational purposes.<n>Most links serve as explanatory references rather than as evidence or attribution.
arXiv Detail & Related papers (2025-05-17T00:52:24Z) - Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset [10.756673240445709]
We first provide a systematic analysis of similarities and discrepancies between legitimate and hoax Wikipedia articles.
We then introduce Hoaxpedia, a collection of 311 hoax articles.
Our results suggest that detecting deceitful content in Wikipedia based on content alone is hard but feasible.
arXiv Detail & Related papers (2024-05-03T15:25:48Z) - AKEW: Assessing Knowledge Editing in the Wild [79.96813982502952]
AKEW (Assessing Knowledge Editing in the Wild) is a new practical benchmark for knowledge editing.
It fully covers three editing settings of knowledge updates: structured facts, unstructured texts as facts, and extracted triplets.
Through extensive experiments, we demonstrate the considerable gap between state-of-the-art knowledge-editing methods and practical scenarios.
arXiv Detail & Related papers (2024-02-29T07:08:34Z) - Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles.
We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia.
We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A preliminary approach to knowledge integrity risk assessment in
Wikipedia projects [0.0]
We introduce a taxonomy of knowledge integrity risks across Wikipedia projects and a first set of indicators to assess internal risks related to community and content issues.
On top of this taxonomy, we offer a preliminary analysis illustrating how the lack of editors' geographical diversity might represent a knowledge integrity risk.
These are the first steps of a research project to build a Wikipedia Knowledge Integrity Risk Observatory.
arXiv Detail & Related papers (2021-06-30T09:47:27Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia.
This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention.
We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z) - WikiHist.html: English Wikipedia's Full Revision History in HTML Format [12.86558129722198]
We develop a parallelized architecture for parsing massive amounts of wikitext using local instances of markup.
We highlight the advantages of WikiHist.html over raw wikitext in an empirical analysis of Wikipedia's hyperlinks.
arXiv Detail & Related papers (2020-01-28T10:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.