On the Value of Wikipedia as a Gateway to the Web
- URL: http://arxiv.org/abs/2102.07385v1
- Date: Mon, 15 Feb 2021 08:08:36 GMT
- Title: On the Value of Wikipedia as a Gateway to the Web
- Authors: Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West
- Abstract summary: In one month, English Wikipedia generated 43M clicks to external websites, in roughly even parts via links in infoboxes, cited references, and article bodies.
Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average.
Wikipedia frequently serves as a stepping stone between search engines and third-party websites.
- Score: 13.703047949952852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By linking to external websites, Wikipedia can act as a gateway to the Web.
To date, however, little is known about the amount of traffic generated by
Wikipedia's external links. We fill this gap in a detailed analysis of usage
logs gathered from Wikipedia users' client devices. Our analysis proceeds in
three steps: First, we quantify the level of engagement with external links,
finding that, in one month, English Wikipedia generated 43M clicks to external
websites, in roughly even parts via links in infoboxes, cited references, and
article bodies. Official links listed in infoboxes have by far the highest
click-through rate (CTR), 2.47% on average. In particular, official links
associated with articles about businesses, educational institutions, and
websites have the highest CTR, whereas official links associated with articles
about geographical content, television, and music have the lowest CTR. Second,
we investigate patterns of engagement with external links, finding that
Wikipedia frequently serves as a stepping stone between search engines and
third-party websites, effectively fulfilling information needs that search
engines do not meet. Third, we quantify the hypothetical economic value of the
clicks received by external websites from English Wikipedia, by estimating that
the respective website owners would need to pay a total of $7--13 million per
month to obtain the same volume of traffic via sponsored search. Overall, these
findings shed light on Wikipedia's role not only as an important source of
information, but also as a high-traffic gateway to the broader Web ecosystem.
Related papers
- Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles.
We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia.
We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z) - WebCPM: Interactive Web Search for Chinese Long-form Question Answering [104.676752359777]
Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses.
We introduce WebCPM, the first Chinese LFQA dataset.
We collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions.
arXiv Detail & Related papers (2023-05-11T14:47:29Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - A Large Scale Study of Reader Interactions with Images on Wikipedia [2.370481325034443]
This study is the first large-scale analysis of how interactions with images happen on Wikipedia.
We quantify the overall engagement with images, finding that one in 29 results in a click on at least one image.
We observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people.
arXiv Detail & Related papers (2021-12-03T12:02:59Z) - Where the Earth is flat and 9/11 is an inside job: A comparative
algorithm audit of conspiratorial information in web search results [62.997667081978825]
We examine the distribution of conspiratorial information in search results across five search engines: Google, Bing, DuckDuckGo, Yahoo and Yandex.
We find that all search engines except Google consistently displayed conspiracy-promoting results and returned links to conspiracy-dedicated websites in their top results.
Most conspiracy-promoting results came from social media and conspiracy-dedicated websites while conspiracy-debunking information was shared by scientific websites and, to a lesser extent, legacy media.
arXiv Detail & Related papers (2021-12-02T14:29:21Z) - Multiple Texts as a Limiting Factor in Online Learning: Quantifying
(Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted.
Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias.
The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z) - A Deeper Investigation of the Importance of Wikipedia Links to the
Success of Search Engines [7.433327915285967]
We report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs)
We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries.
Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.
arXiv Detail & Related papers (2020-04-21T19:58:28Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z) - Quantifying Engagement with Citations on Wikipedia [13.703047949952852]
One in 300 page views results in a reference click.
Clicks occur more frequently on shorter pages and on pages of lower quality.
Recent content, open access sources and references about life events are particularly popular.
arXiv Detail & Related papers (2020-01-23T15:52:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.