A preliminary approach to knowledge integrity risk assessment in
Wikipedia projects
- URL: http://arxiv.org/abs/2106.15940v1
- Date: Wed, 30 Jun 2021 09:47:27 GMT
- Title: A preliminary approach to knowledge integrity risk assessment in
Wikipedia projects
- Authors: Pablo Arag\'on, Diego S\'aez-Trumper
- Abstract summary: We introduce a taxonomy of knowledge integrity risks across Wikipedia projects and a first set of indicators to assess internal risks related to community and content issues.
On top of this taxonomy, we offer a preliminary analysis illustrating how the lack of editors' geographical diversity might represent a knowledge integrity risk.
These are the first steps of a research project to build a Wikipedia Knowledge Integrity Risk Observatory.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Wikipedia is one of the main repositories of free knowledge available today,
with a central role in the Web ecosystem. For this reason, it can also be a
battleground for actors trying to impose specific points of view or even
spreading disinformation online. There is a growing need to monitor its
"health" but this is not an easy task. Wikipedia exists in over 300 language
editions and each project is maintained by a different community, with their
own strengths, weaknesses and limitations. In this paper, we introduce a
taxonomy of knowledge integrity risks across Wikipedia projects and a first set
of indicators to assess internal risks related to community and content issues,
as well as external threats such as the geopolitical and media landscape. On
top of this taxonomy, we offer a preliminary analysis illustrating how the lack
of editors' geographical diversity might represent a knowledge integrity risk.
These are the first steps of a research project to build a Wikipedia Knowledge
Integrity Risk Observatory.
Related papers
- Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles [56.724847946825285]
We introduce Wiki Live Challenge (WLC), a live benchmark that leverages the newest Wikipedia Good Articles (GAs) as expert-level references.<n>We propose Wiki Eval, a comprehensive evaluation framework comprising a fine-grained evaluation method with 39 criteria for writing quality and rigorous metrics for factual verifiability.
arXiv Detail & Related papers (2026-02-02T03:30:13Z) - Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy [1.4705700441788643]
Wikidata is the largest open knowledge graph on the web, encompassing over 120 million entities.<n>This study proposes and applies a novel validation method to confirm the presence of classification errors and over-generalized subclass links.<n>We develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities.
arXiv Detail & Related papers (2025-11-07T02:09:00Z) - Factual Inconsistencies in Multilingual Wikipedia Tables [5.395647076142643]
This study investigates cross-lingual inconsistencies in Wikipedia's structured content.<n>We develop a methodology to collect, align, and analyze tables from Wikipedia multilingual articles.<n>These insights have implications for factual verification, multilingual knowledge interaction, and design for reliable AI systems.
arXiv Detail & Related papers (2025-07-24T13:46:14Z) - A Community-driven vision for a new Knowledge Resource for AI [59.29703403953085]
Despite the success of knowledge resources like WordNet, verifiable, general-purpose widely available sources of knowledge remain a critical deficiency in AI infrastructure.<n>This paper synthesizes our findings and outlines a community-driven vision for a new knowledge infrastructure.
arXiv Detail & Related papers (2025-06-19T20:51:28Z) - Web2Wiki: Characterizing Wikipedia Linking Across the Web [19.00204665059246]
We identify over 90 million Wikipedia links spanning 1.68% of Web domains.<n>Wikipedia is most frequently cited by news and science websites for informational purposes.<n>Most links serve as explanatory references rather than as evidence or attribution.
arXiv Detail & Related papers (2025-05-17T00:52:24Z) - Characterizing Knowledge Manipulation in a Russian Wikipedia Fork [18.630486406259426]
Recently launched website Ruwiki copied and modified original Russian Wikipedia content to conform to Russian law.
This article presents an in-depth analysis of this Russian Wikipedia fork.
We propose a methodology to characterize the main changes with respect to the original version.
arXiv Detail & Related papers (2025-04-14T19:30:30Z) - Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles.
We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia.
We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z) - Between News and History: Identifying Networked Topics of Collective
Attention on Wikipedia [0.0]
We develop a temporal community detection approach towards topic detection.
We apply this method to a dataset of one year of current events on Wikipedia.
We are able to resolve the topics that more strongly reflect unfolding current events vs more established knowledge.
arXiv Detail & Related papers (2022-11-14T18:36:21Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large
Web Corpus [76.9522248303716]
We propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot.
We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus.
We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art-and-read system to match and even outperform Wikipedia-based models.
arXiv Detail & Related papers (2021-12-18T13:15:34Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - Dimensions of Commonsense Knowledge [60.49243784752026]
We survey a wide range of popular commonsense sources with a special focus on their relations.
We consolidate these relations into 13 knowledge dimensions, each abstracting over more specific relations found in sources.
arXiv Detail & Related papers (2021-01-12T17:52:39Z) - Computational linguistic assessment of textbook and online learning
media by means of threshold concepts in business education [59.003956312175795]
From a linguistic perspective, threshold concepts are instances of specialized vocabularies, exhibiting particular linguistic features.
The profiles of 63 threshold concepts from business education have been investigated in textbooks, newspapers, and Wikipedia.
The three kinds of resources can indeed be distinguished in terms of their threshold concepts' profiles.
arXiv Detail & Related papers (2020-08-05T12:56:16Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing
Topics? [8.035521056416242]
We focus on the influence of the interconnect topology between articles describing complementary aspects of polarizing topics.
We introduce a novel measure of exposure to diverse information to quantify users' exposure to different aspects of a topic.
We identify cases in which the network topology significantly limits the exposure of users to diverse information on the topic, encouraging users to remain in a knowledge bubble.
arXiv Detail & Related papers (2020-07-16T09:19:57Z) - Architecture for a multilingual Wikipedia [0.0]
We argue that we need a new approach to tackle this problem more effectively.
This paper proposes an architecture for a system that fulfills this goal.
It separates the goal in two parts: creating and maintaining content in an abstract notation within a project called Abstract Wikipedia, and creating an infrastructure called Wikilambda that can translate this notation to natural language.
arXiv Detail & Related papers (2020-04-08T22:25:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.