How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison
- URL: http://arxiv.org/abs/2510.26899v2
- Date: Mon, 03 Nov 2025 12:50:56 GMT
- Title: How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison
- Authors: Taha Yasseri,
- Abstract summary: Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia.<n>This study undertakes a large-scale computational comparison of 1,800 matched article pairs between Grokipedia and Wikipedia.<n>Using metrics across lexical richness, readability, structural organization, reference density, and semantic similarity, we assess how closely the two platforms align in form and substance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produce "truthful" entries via the large language model Grok. Yet whether an AI-driven alternative can escape the biases and limitations of human-edited platforms remains unclear. This study undertakes a large-scale computational comparison of 1,800 matched article pairs between Grokipedia and Wikipedia, drawn from the 2,000 most-edited Wikipedia pages. Using metrics across lexical richness, readability, structural organization, reference density, and semantic similarity, we assess how closely the two platforms align in form and substance. The results show that while Grokipedia exhibits strong semantic and stylistic alignment with Wikipedia, it typically produces longer but less lexically diverse articles, with fewer references per word and greater structural variability. These findings suggest that AI-generated encyclopedic content currently mirrors Wikipedia's informational scope but diverges in editorial norms, favoring narrative expansion over citation-based verification. The implications highlight new tensions around transparency, provenance, and the governance of knowledge in an era of automated text generation.
Related papers
- Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias [1.2109519547057517]
We examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content.<n>We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity.<n>Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification.
arXiv Detail & Related papers (2026-02-05T10:24:21Z) - Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles [56.724847946825285]
We introduce Wiki Live Challenge (WLC), a live benchmark that leverages the newest Wikipedia Good Articles (GAs) as expert-level references.<n>We propose Wiki Eval, a comprehensive evaluation framework comprising a fine-grained evaluation method with 39 criteria for writing quality and rigorous metrics for factual verifiability.
arXiv Detail & Related papers (2026-02-02T03:30:13Z) - Is Grokipedia Right-Leaning? Comparing Political Framing in Wikipedia and Grokipedia on Controversial Topics [2.374078750219017]
This paper presents a comparative analysis of Wikipedia and Grokipedia on well-established politically contested topics.<n>We find that semantic similarity between the two platforms decays across article sections and diverges more strongly on controversial topics than on randomly sampled ones.<n>We show that both encyclopedias predominantly exhibit left-leaning framings, although Grokipedia exhibits a more bimodal distribution with increased prominence of right-leaning content.
arXiv Detail & Related papers (2026-01-21T21:36:12Z) - Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority [0.0]
A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge.<n>The emergence of generative AI encyclopedias, such as Grokipedia, possibly presents another potential shift in curation.<n>This study investigates whether AI- and human-curated encyclopedias rely on the same foundations of authority.
arXiv Detail & Related papers (2025-12-03T01:05:32Z) - Factual Inconsistencies in Multilingual Wikipedia Tables [5.395647076142643]
This study investigates cross-lingual inconsistencies in Wikipedia's structured content.<n>We develop a methodology to collect, align, and analyze tables from Wikipedia multilingual articles.<n>These insights have implications for factual verification, multilingual knowledge interaction, and design for reliable AI systems.
arXiv Detail & Related papers (2025-07-24T13:46:14Z) - Characterizing Knowledge Manipulation in a Russian Wikipedia Fork [18.630486406259426]
Recently launched website Ruwiki copied and modified original Russian Wikipedia content to conform to Russian law.<n>This article presents an in-depth analysis of this Russian Wikipedia fork.<n>We propose a methodology to characterize the main changes with respect to the original version.
arXiv Detail & Related papers (2025-04-14T19:30:30Z) - QUDsim: Quantifying Discourse Similarities in LLM-Generated Text [70.22275200293964]
We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression.<n>We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents.<n>Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
arXiv Detail & Related papers (2025-04-12T23:46:09Z) - Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset [10.756673240445709]
We first provide a systematic analysis of similarities and discrepancies between legitimate and hoax Wikipedia articles.
We then introduce Hoaxpedia, a collection of 311 hoax articles.
Our results suggest that detecting deceitful content in Wikipedia based on content alone is hard but feasible.
arXiv Detail & Related papers (2024-05-03T15:25:48Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - Language-agnostic Topic Classification for Wikipedia [1.950869817974852]
We propose a language-agnostic approach based on the links in an article for classifying articles into a taxonomy of topics.
We show that it matches the performance of a language-dependent approach while being simpler and having much greater coverage.
arXiv Detail & Related papers (2021-02-26T22:17:50Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.