Longitudinal Assessment of Reference Quality on Wikipedia
- URL: http://arxiv.org/abs/2303.05227v1
- Date: Thu, 9 Mar 2023 13:04:14 GMT
- Title: Longitudinal Assessment of Reference Quality on Wikipedia
- Authors: Aitolkyn Baigutanova, Jaehyeon Myung, Diego Saez-Trumper, Ai-Jou Chou,
Miriam Redi, Changwook Jung, Meeyoung Cha
- Abstract summary: This work analyzes the reliability of this global encyclopedia through the lens of its references.
We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references.
- Score: 7.823541290904653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wikipedia plays a crucial role in the integrity of the Web. This work
analyzes the reliability of this global encyclopedia through the lens of its
references. We operationalize the notion of reference quality by defining
reference need (RN), i.e., the percentage of sentences missing a citation, and
reference risk (RR), i.e., the proportion of non-authoritative references. We
release Citation Detective, a tool for automatically calculating the RN score,
and discover that the RN score has dropped by 20 percent point in the last
decade, with more than half of verifiable statements now accompanying
references. The RR score has remained below 1% over the years as a result of
the efforts of the community to eliminate unreliable references. We propose
pairing novice and experienced editors on the same Wikipedia article as a
strategy to enhance reference quality. Our quasi-experiment indicates that such
a co-editing experience can result in a lasting advantage in identifying
unreliable sources in future edits. As Wikipedia is frequently used as the
ground truth for numerous Web applications, our findings and suggestions on its
reliability can have a far-reaching impact. We discuss the possibility of other
Web services adopting Wiki-style user collaboration to eliminate unreliable
content.
Related papers
- RevisEval: Improving LLM-as-a-Judge via Response-Adapted References [95.29800580588592]
RevisEval is a novel text generation evaluation paradigm via the response-adapted references.
RevisEval is driven by the key observation that an ideal reference should maintain the necessary relevance to the response to be evaluated.
arXiv Detail & Related papers (2024-10-07T16:50:47Z) - HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits [92.62157408704594]
HelloFresh is based on continuous streams of real-world data generated by intrinsically motivated human labelers.
It covers recent events from X (formerly Twitter) community notes and edits of Wikipedia pages.
It mitigates the risk of test data contamination and benchmark overfitting.
arXiv Detail & Related papers (2024-06-05T16:25:57Z) - A Comparative Study of Reference Reliability in Multiple Language
Editions of Wikipedia [12.919146538916353]
This study examines over 5 million Wikipedia articles to assess the reliability of references in multiple language editions.
Some sources deemed untrustworthy in one language (i.e., English) continue to appear in articles in other languages.
Non-authoritative sources found in the English version of a page tend to persist in other language versions of that page.
arXiv Detail & Related papers (2023-09-01T01:19:59Z) - Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References [123.39034752499076]
Div-Ref is a method to enhance evaluation benchmarks by enriching the number of references.
We conduct experiments to empirically demonstrate that diversifying the expression of reference can significantly enhance the correlation between automatic evaluation and human evaluation.
arXiv Detail & Related papers (2023-05-24T11:53:29Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation
Metrics for Open-domain Dialog Generation [63.46331073232526]
We present an enhancement approach to Reference-based EvAluation Metrics for open-domain dialogue systems.
A prediction model is designed to estimate the reliability of the given reference set.
We show how its predicted results can be helpful to augment the reference set, and thus improve the reliability of the metric.
arXiv Detail & Related papers (2021-05-30T10:04:13Z) - Wiki-Reliability: A Large Scale Dataset for Content Reliability on
Wikipedia [4.148821165759295]
We build the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues.
To build this dataset, we rely on Wikipedia "templates"
We select the 10 most popular reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions as positive or negative.
arXiv Detail & Related papers (2021-05-10T05:07:03Z) - 'I Updated the <ref>': The Evolution of References in the English
Wikipedia and the Implications for Altmetrics [0.0]
We present a dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019.
We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions.
arXiv Detail & Related papers (2020-10-06T23:26:12Z) - Quantifying Engagement with Citations on Wikipedia [13.703047949952852]
One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
arXiv Detail & Related papers (2020-01-23T15:52:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.