Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias
- URL: http://arxiv.org/abs/2602.05519v1
- Date: Thu, 05 Feb 2026 10:24:21 GMT
- Title: Wikipedia and Grokipedia: A Comparison of Human and Generative Encyclopedias
- Authors: Ortal Hadad, Edoardo Loru, Jacopo Nudo, Anita Bonetti, Matteo Cinelli, Walter Quattrociocchi,
- Abstract summary: We examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content.<n>We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity.<n>Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification.
- Score: 1.2109519547057517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a comparative analysis of Wikipedia and Grokipedia to examine how generative mediation alters content selection, textual rewriting, narrative structure, and evaluative framing in encyclopedic content. We model page inclusion in Grokipedia as a function of Wikipedia page popularity, density of reference, and recent editorial activity. Inclusion is non-uniform: pages with higher visibility and greater editorial conflict in Wikipedia are more likely to appear in Grokipedia. For included pages, we distinguish between verbatim reproduction and generative rewriting. Rewriting is more frequent for pages with higher reference density and recent controversy, while highly popular pages are more often reproduced without modification. We compare editing activity across the two platforms and estimate page complexity using a fitness-complexity framework to assess whether generative mediation alters patterns of editorial participation. To assess narrative organization, we construct actor-relation networks from article texts using abstract meaning representation. Across multiple topical domains, including U.S. politics, geopolitics, and conspiracy-related narratives, narrative structure remains largely consistent between the two sources. Analysis of lead sections shows broadly correlated framing, with localized shifts in laudatory and conflict-oriented language for some topics in Grokipedia. Overall, generative systems preserve the main structural organization of encyclopedic content, while affecting how content is selected, rewritten, and framed.
Related papers
- Is Grokipedia Right-Leaning? Comparing Political Framing in Wikipedia and Grokipedia on Controversial Topics [2.374078750219017]
This paper presents a comparative analysis of Wikipedia and Grokipedia on well-established politically contested topics.<n>We find that semantic similarity between the two platforms decays across article sections and diverges more strongly on controversial topics than on randomly sampled ones.<n>We show that both encyclopedias predominantly exhibit left-leaning framings, although Grokipedia exhibits a more bimodal distribution with increased prominence of right-leaning content.
arXiv Detail & Related papers (2026-01-21T21:36:12Z) - Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority [0.0]
A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge.<n>The emergence of generative AI encyclopedias, such as Grokipedia, possibly presents another potential shift in curation.<n>This study investigates whether AI- and human-curated encyclopedias rely on the same foundations of authority.
arXiv Detail & Related papers (2025-12-03T01:05:32Z) - How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison [0.0]
Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia.<n>This study undertakes a large-scale computational comparison of 1,800 matched article pairs between Grokipedia and Wikipedia.<n>Using metrics across lexical richness, readability, structural organization, reference density, and semantic similarity, we assess how closely the two platforms align in form and substance.
arXiv Detail & Related papers (2025-10-30T18:04:46Z) - DiscoSum: Discourse-aware News Summarization [79.4884227574627]
We introduce a novel approach to integrating discourse structure into summarization processes.<n>We present a novel summarization dataset where news articles are summarized multiple times in different ways across different social media platforms.<n>We develop a novel news discourse schema to describe summarization structures and a novel algorithm, DiscoSum, which employs beam search technique for structure-aware summarization.
arXiv Detail & Related papers (2025-06-07T22:00:30Z) - Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models [11.597314728459573]
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.
We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking.
arXiv Detail & Related papers (2024-02-22T01:20:17Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Paragraph-level Commonsense Transformers with Recurrent Memory [77.4133779538797]
We train a discourse-aware model that incorporates paragraph-level information to generate coherent commonsense inferences from narratives.
Our results show that PARA-COMET outperforms the sentence-level baselines, particularly in generating inferences that are both coherent and novel.
arXiv Detail & Related papers (2020-10-04T05:24:12Z) - How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing
Topics? [8.035521056416242]
We focus on the influence of the interconnect topology between articles describing complementary aspects of polarizing topics.
We introduce a novel measure of exposure to diverse information to quantify users' exposure to different aspects of a topic.
We identify cases in which the network topology significantly limits the exposure of users to diverse information on the topic, encouraging users to remain in a knowledge bubble.
arXiv Detail & Related papers (2020-07-16T09:19:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.