Data Voids and Warning Banners on Google Search
- URL: http://arxiv.org/abs/2502.17542v1
- Date: Mon, 24 Feb 2025 18:56:04 GMT
- Title: Data Voids and Warning Banners on Google Search
- Authors: Ronald E. Robertson, Evan M. Williams, Kathleen M. Carley, David Thiel,
- Abstract summary: We collected 1.4M unique search queries shared on social media to surface Google's warning banners.<n>We found that Google returned a warning banner for about 1% of our search queries.<n>We identify 29 to 58 times more low-quality data voids than there were low-quality banners.
- Score: 4.4534065108405665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The content moderation systems used by social media sites are a topic of widespread interest and research, but less is known about the use of similar systems by web search engines. For example, Google Search attempts to help its users navigate three distinct types of data voids--when the available search results are deemed low-quality, low-relevance, or rapidly-changing--by placing one of three corresponding warning banners at the top of the search page. Here we collected 1.4M unique search queries shared on social media to surface Google's warning banners, examine when and why those banners were applied, and train deep learning models to identify data voids beyond Google's classifications. Across three data collection waves (Oct 2023, Mar 2024, Sept 2024), we found that Google returned a warning banner for about 1% of our search queries, with substantial churn in the set of queries that received a banner across waves. The low-quality banners, which warn users that their results "may not have reliable information on this topic," were especially rare, and their presence was associated with low-quality domains in the search results and conspiracy-related keywords in the search query. Low-quality banner presence was also inconsistent over short time spans, even when returning highly similar search results. In August 2024, low-quality banners stopped appearing on the SERPs we collected, but average search result quality remained largely unchanged, suggesting they may have been discontinued by Google. Using our deep learning models to analyze both queries and search results in context, we identify 29 to 58 times more low-quality data voids than there were low-quality banners, and find a similar number after the banners had disappeared. Our findings point to the need for greater transparency on search engines' content moderation practices, especially around important events like elections.
Related papers
- SmartSearch: Process Reward-Guided Query Refinement for Search Agents [63.46067892354375]
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems.<n>Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked.<n>We introduce SmartSearch, a framework built upon two key mechanisms to mitigate this issue.
arXiv Detail & Related papers (2026-01-08T12:39:05Z) - The enshittification of online search? Privacy and quality of Google, Bing and Apple in coding advice [1.8528929583956726]
We evaluate the search quality of Google Search, Microsoft Bing, and Apple Search.<n>We use two independent metrics of search quality: 1) the number of trackers on the first search result, as a measure of privacy in web search, and 2) the average rank of the first Stack Overflow search result.<n>Our results suggest that the privacy of search results is higher on Bing than on Google and Apple. Similarly, the quality of coding advice -- as measured by the average rank of Stack Overflow -- was highest on Bing.
arXiv Detail & Related papers (2025-12-03T13:42:22Z) - Into the Void: Understanding Online Health Information in Low-Web Data Languages [8.999413477506554]
We study the characteristics of search results for health queries in Tigrinya and Amharic as languages of study.<n>We find that search results for health queries in low-web data languages may not always be in the language of search.<n>We show that search results that diverge from their queries in low-resourced languages are due to algorithmic failures, (un)intentional manipulation, or active manipulation by content creators.
arXiv Detail & Related papers (2025-09-24T15:35:01Z) - Implicit Search via Discrete Diffusion: A Study on Chess [104.74301574891359]
We propose DiffuSearch, a model that does textitimplicit search by looking into the future world via discrete diffusion modeling.
We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential.
We show DiffuSearch outperforms both the searchless and explicit search-enhanced policies.
arXiv Detail & Related papers (2025-02-27T06:25:15Z) - Auditing Google's Search Algorithm: Measuring News Diversity Across Brazil, the UK, and the US [0.0]
This study examines the influence of Google's search algorithm on news diversity by analyzing search results in Brazil, the UK, and the US.
It explores how Google's system preferentially favors a limited number of news outlets.
Findings indicate a slight leftward bias in search outcomes and a preference for popular, often national outlets.
arXiv Detail & Related papers (2024-10-31T11:49:16Z) - Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - Algorithmically Curated Lies: How Search Engines Handle Misinformation
about US Biolabs in Ukraine [39.58317527488534]
We conduct virtual agent-based algorithm audits of Google, Bing, and Yandex search outputs in June 2022.
We find significant disparities in misinformation exposure based on the language of search, with all search engines presenting a higher number of false stories in Russian.
These observations stress the possibility of AICSs being vulnerable to manipulation, in particular in the case of the unfolding propaganda campaigns.
arXiv Detail & Related papers (2024-01-24T22:15:38Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising [58.09698019028931]
How to pair the video ads with the user search is the core task of Baidu video advertising.
Due to the modality gap, the query-to-video retrieval is much more challenging than traditional query-to-document retrieval.
We present a tree-based combo-attention network (TCAN) which has been recently launched in Baidu's dynamic video advertising platform.
arXiv Detail & Related papers (2022-09-19T04:49:51Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - The Matter of Chance: Auditing Web Search Results Related to the 2020
U.S. Presidential Primary Elections Across Six Search Engines [68.8204255655161]
We look at the text search results for "us elections", "donald trump", "joe biden" and "bernie sanders" queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex.
Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents.
arXiv Detail & Related papers (2021-05-03T11:18:19Z) - Search Engine Similarity Analysis: A Combined Content and Rankings
Approach [6.69087470775851]
We present an analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo.
We developed a new similarity metric that leverages both the content and the ranking of search responses.
We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.
arXiv Detail & Related papers (2020-11-01T23:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.