Quantifying Engagement with Citations on Wikipedia
- URL: http://arxiv.org/abs/2001.08614v2
- Date: Sun, 26 Jan 2020 17:38:32 GMT
- Title: Quantifying Engagement with Citations on Wikipedia
- Authors: Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West
- Abstract summary: One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
- Score: 13.703047949952852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wikipedia, the free online encyclopedia that anyone can edit, is one of the
most visited sites on the Web and a common source of information for many
users. As an encyclopedia, Wikipedia is not a source of original information,
but was conceived as a gateway to secondary sources: according to Wikipedia's
guidelines, facts must be backed up by reliable sources that reflect the full
spectrum of views on the topic. Although citations lie at the very heart of
Wikipedia, little is known about how users interact with them. To close this
gap, we built client-side instrumentation for logging all interactions with
links leading from English Wikipedia articles to cited references during one
month, and conducted the first analysis of readers' interaction with citations
on Wikipedia. We find that overall engagement with citations is low: about one
in 300 page views results in a reference click (0.29% overall; 0.56% on
desktop; 0.13% on mobile). Matched observational studies of the factors
associated with reference clicking reveal that clicks occur more frequently on
shorter pages and on pages of lower quality, suggesting that references are
consulted more commonly when Wikipedia itself does not contain the information
sought by the user. Moreover, we observe that recent content, open access
sources and references about life events (births, deaths, marriages, etc) are
particularly popular. Taken together, our findings open the door to a deeper
understanding of Wikipedia's role in a global information economy where
reliability is ever less certain, and source attribution ever more vital.
Related papers
- Forgotten Knowledge: Examining the Citational Amnesia in NLP [63.13508571014673]
We show how far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia?
We show that around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old.
We show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity.
arXiv Detail & Related papers (2023-05-29T18:30:34Z) - Longitudinal Assessment of Reference Quality on Wikipedia [7.823541290904653]
This work analyzes the reliability of this global encyclopedia through the lens of its references.
We operationalize the notion of reference quality by defining reference need (RN), i.e., the percentage of sentences missing a citation, and reference risk (RR), i.e., the proportion of non-authoritative references.
arXiv Detail & Related papers (2023-03-09T13:04:14Z) - Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia [59.47639408597319]
Kuaipedia is a large-scale multi-modal encyclopedia consisting of items, aspects, and short videos lined to them.
It was extracted from billions of videos of Kuaishou, a well-known short-video platform in China.
arXiv Detail & Related papers (2022-10-28T12:54:30Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Large Scale Study of Reader Interactions with Images on Wikipedia [2.370481325034443]
This study is the first large-scale analysis of how interactions with images happen on Wikipedia.
We quantify the overall engagement with images, finding that one in 29 results in a click on at least one image.
We observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people.
arXiv Detail & Related papers (2021-12-03T12:02:59Z) - A Map of Science in Wikipedia [0.22843885788439797]
We map the relationship between Wikipedia articles and scientific journal articles.
Most journal articles cited from Wikipedia belong to STEM fields, in particular biology and medicine.
Wikipedia's biographies play an important role in connecting STEM fields with the humanities, especially history.
arXiv Detail & Related papers (2021-10-26T15:44:32Z) - Assessing the quality of sources in Wikidata across languages: a hybrid
approach [64.05097584373979]
We run a series of microtasks experiments to evaluate a large corpus of references, sampled from Wikidata triples with labels in several languages.
We use a consolidated, curated version of the crowdsourced assessments to train several machine learning models to scale up the analysis to the whole of Wikidata.
The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.
arXiv Detail & Related papers (2021-09-20T10:06:46Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.