Related papers: Impact of AI Search Summaries on Website Traffic: Evidence from Google AI Overviews and Wikipedia

Impact of AI Search Summaries on Website Traffic: Evidence from Google AI Overviews and Wikipedia

URL: http://arxiv.org/abs/2602.18455v1
Date: Thu, 05 Feb 2026 01:31:44 GMT
Title: Impact of AI Search Summaries on Website Traffic: Evidence from Google AI Overviews and Wikipedia
Authors: Mehrzad Khosravi, Hema Yoganarasimhan,
Abstract summary: We estimate the causal impact of Google's AI Overview on Wikipedia traffic.<n>Across 161,382 matched article-language pairs, AIO exposure reduces daily traffic to English articles by approximately 15%.<n>These findings provide early causal evidence that generative-answer features in search engines can materially reallocate attention away from informational publishers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Search engines increasingly display LLM-generated answers shown above organic links, shifting search from link lists to answer-first summaries. Publishers contend these summaries substitute for source pages and cannibalize traffic, while platforms argue they are complementary by directing users through included links. We estimate the causal impact of Google's AI Overview (AIO) on Wikipedia traffic by leveraging the feature's staggered geographic rollout and Wikipedia's multilingual structure. Using a difference-in-differences design, we compare English Wikipedia articles exposed to AIO to the same underlying articles in language editions (Hindi, Indonesian, Japanese, and Portuguese) that were not exposed to AIO during the observation period. Across 161,382 matched article-language pairs, AIO exposure reduces daily traffic to English articles by approximately 15%. Effects are heterogeneous: relative declines are largest for Culture articles and substantially smaller for STEM, consistent with stronger substitution when short synthesized answers satisfy informational intent. These findings provide early causal evidence that generative-answer features in search engines can materially reallocate attention away from informational publishers, with implications for content monetization, search platform design, and policy.

Related papers

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison [0.0]
Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as a response to perceived ideological and structural biases in Wikipedia.<n>This study undertakes a large-scale computational comparison of 1,800 matched article pairs between Grokipedia and Wikipedia.<n>Using metrics across lexical richness, readability, structural organization, reference density, and semantic similarity, we assess how closely the two platforms align in form and substance.
arXiv Detail & Related papers (2025-10-30T18:04:46Z)
Could AI Trace and Explain the Origins of AI-Generated Images and Text? [53.11173194293537]
AI-generated content is increasingly prevalent in the real world.<n> adversaries might exploit large multimodal models to create images that violate ethical or legal standards.<n>Paper reviewers may misuse large language models to generate reviews without genuine intellectual effort.
arXiv Detail & Related papers (2025-04-05T20:51:54Z)
The Rise of AI-Generated Content in Wikipedia [1.3654846342364308]
We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint.
arXiv Detail & Related papers (2024-10-10T15:36:10Z)
Orphan Articles: The Dark Matter of Wikipedia [13.290424502717734]
We conduct the first systematic study of orphan articles, which are articles without any incoming links from other Wikipedia articles. We find that a surprisingly large extent of content, roughly 15% (8.8M) of all articles, is de facto invisible to readers navigating Wikipedia. We also provide causal evidence through a quasi-experiment that adding new incoming links to orphans (de-orphanization) leads to a statistically significant increase of their visibility.
arXiv Detail & Related papers (2023-06-06T18:04:33Z)
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z)
WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles. The dataset consists of over 80k English samples on 6987 topics. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages [60.00219873112454]
We investigate the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.
arXiv Detail & Related papers (2020-08-05T11:11:55Z)
How Inclusive Are Wikipedia's Hyperlinks in Articles Covering Polarizing Topics? [8.035521056416242]
We focus on the influence of the interconnect topology between articles describing complementary aspects of polarizing topics. We introduce a novel measure of exposure to diverse information to quantify users' exposure to different aspects of a topic. We identify cases in which the network topology significantly limits the exposure of users to diverse information on the topic, encouraging users to remain in a knowledge bubble.
arXiv Detail & Related papers (2020-07-16T09:19:57Z)
Design Challenges in Low-resource Cross-lingual Entity Linking [56.18957576362098]
Cross-lingual Entity Linking (XEL) is the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia. This paper focuses on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention. We present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs.
arXiv Detail & Related papers (2020-05-02T04:00:26Z)
A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines [7.433327915285967]
We report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs) We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries. Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers.
arXiv Detail & Related papers (2020-04-21T19:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.