How Grounded is Wikipedia? A Study on Structured Evidential Support
- URL: http://arxiv.org/abs/2506.12637v1
- Date: Sat, 14 Jun 2025 21:40:14 GMT
- Title: How Grounded is Wikipedia? A Study on Structured Evidential Support
- Authors: William Walden, Kathryn Ricci, Miriam Wanner, Zhengping Jiang, Chandler May, Rongkun Zhou, Benjamin Van Durme,
- Abstract summary: We show that roughly 20% of claims in Wikipedia *lead* sections are unsupported by the article body.<n>We also show that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.
- Score: 27.55382517488165
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on Wikipedia articles of notable people. We show that roughly 20% of claims in Wikipedia *lead* sections are unsupported by the article body; roughly 27% of annotated claims in the article *body* are unsupported by their (publicly accessible) cited sources; and >80% of lead claims cannot be traced to these sources via annotated body evidence. Further, we show that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.
Related papers
- Web2Wiki: Characterizing Wikipedia Linking Across the Web [19.00204665059246]
We identify over 90 million Wikipedia links spanning 1.68% of Web domains.<n>Wikipedia is most frequently cited by news and science websites for informational purposes.<n>Most links serve as explanatory references rather than as evidence or attribution.
arXiv Detail & Related papers (2025-05-17T00:52:24Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Generating Literal and Implied Subquestions to Fact-check Complex Claims [64.81832149826035]
We focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim.
We present ClaimDecomp, a dataset of decompositions for over 1000 claims.
We show that these subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers.
arXiv Detail & Related papers (2022-05-14T00:40:57Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Map of Science in Wikipedia [0.22843885788439797]
We map the relationship between Wikipedia articles and scientific journal articles.
Most journal articles cited from Wikipedia belong to STEM fields, in particular biology and medicine.
Wikipedia's biographies play an important role in connecting STEM fields with the humanities, especially history.
arXiv Detail & Related papers (2021-10-26T15:44:32Z) - WhatTheWikiFact: Fact-Checking Claims Against Wikipedia [17.36054090232896]
We present WhatTheWikiFact, a system for automatic claim verification using Wikipedia.
The system predicts the veracity of an input claim, and it further shows the evidence it has retrieved as part of the verification process.
arXiv Detail & Related papers (2021-04-16T12:23:56Z) - HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification [74.66819506353086]
HoVer is a dataset for many-hop evidence extraction and fact verification.
It challenges models to extract facts from several Wikipedia articles that are relevant to a claim.
Most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations.
arXiv Detail & Related papers (2020-11-05T20:33:11Z) - Quantifying Engagement with Citations on Wikipedia [13.703047949952852]
One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
arXiv Detail & Related papers (2020-01-23T15:52:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.