A Large-Scale Characterization of How Readers Browse Wikipedia
- URL: http://arxiv.org/abs/2112.11848v3
- Date: Wed, 18 Jan 2023 00:50:59 GMT
- Title: A Large-Scale Characterization of How Readers Browse Wikipedia
- Authors: Tiziano Piccardi, Martin Gerlach, Akhil Arora, Robert West
- Abstract summary: We present the first systematic large-scale analysis of how readers browse Wikipedia.
Using billions of page requests from Wikipedia's server logs, we measure how readers reach articles.
We find that navigation behavior is characterized by highly diverse structures.
- Score: 13.106604261718381
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the importance and pervasiveness of Wikipedia as one of the largest
platforms for open knowledge, surprisingly little is known about how people
navigate its content when seeking information. To bridge this gap, we present
the first systematic large-scale analysis of how readers browse Wikipedia.
Using billions of page requests from Wikipedia's server logs, we measure how
readers reach articles, how they transition between articles, and how these
patterns combine into more complex navigation paths. We find that navigation
behavior is characterized by highly diverse structures. Although most
navigation paths are shallow, comprising a single pageload, there is much
variety, and the depth and shape of paths vary systematically with topic,
device type, and time of day. We show that Wikipedia navigation paths commonly
mesh with external pages as part of a larger online ecosystem, and we describe
how naturally occurring navigation paths are distinct from targeted navigation
in lab-based settings. Our results further suggest that navigation is abandoned
when readers reach low-quality pages. Taken together, these insights contribute
to a more systematic understanding of readers' information needs and allow for
improving their experience on Wikipedia and the Web in general.
Related papers
- Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles.
We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia.
We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z) - Towards Versatile Embodied Navigation [120.73460380993305]
Vienna is a versatile embodied navigation agent that simultaneously learns to perform the four navigation tasks with one model.
We empirically demonstrate that, compared with learning each visual navigation task individually, our agent achieves comparable or even better performance with reduced complexity.
arXiv Detail & Related papers (2022-10-30T11:53:49Z) - Kuaipedia: a Large-scale Multi-modal Short-video Encyclopedia [59.47639408597319]
Kuaipedia is a large-scale multi-modal encyclopedia consisting of items, aspects, and short videos lined to them.
It was extracted from billions of videos of Kuaishou, a well-known short-video platform in China.
arXiv Detail & Related papers (2022-10-28T12:54:30Z) - Wikipedia Reader Navigation: When Synthetic Data Is Enough [11.99768070409472]
We quantify the differences between real navigation sequences and synthetic sequences generated from the clickstream data.
We find that the differences between real and synthetic sequences are statistically significant, but with small effect sizes, often well below 10%.
This constitutes quantitative evidence for the utility of the Wikipedia clickstream data as a public resource.
arXiv Detail & Related papers (2022-01-03T18:58:39Z) - Surfer100: Generating Surveys From Web Resources on Wikipedia-style [49.23675182917996]
We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation.
We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys.
arXiv Detail & Related papers (2021-12-13T02:18:01Z) - A Large Scale Study of Reader Interactions with Images on Wikipedia [2.370481325034443]
This study is the first large-scale analysis of how interactions with images happen on Wikipedia.
We quantify the overall engagement with images, finding that one in 29 results in a click on at least one image.
We observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people.
arXiv Detail & Related papers (2021-12-03T12:02:59Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - Exploring Navigation Styles in a FutureLearn MOOC [61.58283466715385]
This paper presents for the first time a detailed analysis of fine-grained navigation style identification in MOOCs backed by a large number of active learners.
It provides insight into online learners' temporal engagement, as well as a tool to identify vulnerable learners.
arXiv Detail & Related papers (2020-08-10T19:12:21Z) - How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing
Topics? [8.035521056416242]
We focus on the influence of the interconnect topology between articles describing complementary aspects of polarizing topics.
We introduce a novel measure of exposure to diverse information to quantify users' exposure to different aspects of a topic.
We identify cases in which the network topology significantly limits the exposure of users to diverse information on the topic, encouraging users to remain in a knowledge bubble.
arXiv Detail & Related papers (2020-07-16T09:19:57Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.