A Deeper Investigation of the Importance of Wikipedia Links to the
Success of Search Engines
- URL: http://arxiv.org/abs/2004.10265v1
- Date: Tue, 21 Apr 2020 19:58:28 GMT
- Title: A Deeper Investigation of the Importance of Wikipedia Links to the
Success of Search Engines
- Authors: Nicholas Vincent and Brent Hecht
- Abstract summary: We report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs)
We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries.
Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.
- Score: 7.433327915285967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A growing body of work has highlighted the important role that Wikipedia's
volunteer-created content plays in helping search engines achieve their core
goal of addressing the information needs of millions of people. In this paper,
we report the results of an investigation into the incidence of Wikipedia links
in search engine results pages (SERPs). Our results extend prior work by
considering three U.S. search engines, simulating both mobile and desktop
devices, and using a spatial analysis approach designed to study modern SERPs
that are no longer just "ten blue links". We find that Wikipedia links are
extremely common in important search contexts, appearing in 67-84% of all SERPs
for common and trending queries, but less often for medical queries.
Furthermore, we observe that Wikipedia links often appear in "Knowledge Panel"
SERP elements and are in positions visible to users without scrolling, although
Wikipedia appears less in prominent positions on mobile devices. Our findings
reinforce the complementary notions that (1) Wikipedia content and research has
major impact outside of the Wikipedia domain and (2) powerful technologies like
search engines are highly reliant on free content created by volunteers.
Related papers
- User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - Towards Proactive Information Retrieval in Noisy Text with Wikipedia
Concepts [6.744385328015561]
This work explores how exploiting the context of a query using Wikipedia concepts can improve proactive information retrieval on noisy text.
Our experiments around a podcast segment retrieval task demonstrate that there is a clear signal of relevance in Wikipedia concepts.
We also find Wikifying the background context of a query can help disambiguate the meaning of the query, further helping proactive information retrieval.
arXiv Detail & Related papers (2022-10-18T14:12:06Z) - Improving Wikipedia Verifiability with AI [116.69749668874493]
We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims.
Our first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims.
Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
arXiv Detail & Related papers (2022-07-08T15:23:29Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - A Large-Scale Characterization of How Readers Browse Wikipedia [13.106604261718381]
We present the first systematic large-scale analysis of how readers browse Wikipedia.
Using billions of page requests from Wikipedia's server logs, we measure how readers reach articles.
We find that navigation behavior is characterized by highly diverse structures.
arXiv Detail & Related papers (2021-12-22T12:54:44Z) - The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large
Web Corpus [76.9522248303716]
We propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot.
We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus.
We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art-and-read system to match and even outperform Wikipedia-based models.
arXiv Detail & Related papers (2021-12-18T13:15:34Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Large Scale Study of Reader Interactions with Images on Wikipedia [2.370481325034443]
This study is the first large-scale analysis of how interactions with images happen on Wikipedia.
We quantify the overall engagement with images, finding that one in 29 results in a click on at least one image.
We observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people.
arXiv Detail & Related papers (2021-12-03T12:02:59Z) - On the Value of Wikipedia as a Gateway to the Web [13.703047949952852]
In one month, English Wikipedia generated 43M clicks to external websites, in roughly even parts via links in infoboxes, cited references, and article bodies.
Official links listed in infoboxes have by far the highest click-through rate (CTR), 2.47% on average.
Wikipedia frequently serves as a stepping stone between search engines and third-party websites.
arXiv Detail & Related papers (2021-02-15T08:08:36Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.