Search Engine Similarity Analysis: A Combined Content and Rankings
Approach
- URL: http://arxiv.org/abs/2011.00650v2
- Date: Fri, 6 Nov 2020 17:11:10 GMT
- Title: Search Engine Similarity Analysis: A Combined Content and Rankings
Approach
- Authors: Konstantina Dritsa, Thodoris Sotiropoulos, Haris Skarpetis, Panos
Louridas
- Abstract summary: We present an analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo.
We developed a new similarity metric that leverages both the content and the ranking of search responses.
We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.
- Score: 6.69087470775851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How different are search engines? The search engine wars are a favorite topic
of on-line analysts, as two of the biggest companies in the world, Google and
Microsoft, battle for prevalence of the web search space. Differences in search
engine popularity can be explained by their effectiveness or other factors,
such as familiarity with the most popular first engine, peer imitation, or
force of habit. In this work we present a thorough analysis of the affinity of
the two major search engines, Google and Bing, along with DuckDuckGo, which
goes to great lengths to emphasize its privacy-friendly credentials. To do so,
we collected search results using a comprehensive set of 300 unique queries for
two time periods in 2016 and 2019, and developed a new similarity metric that
leverages both the content and the ranking of search responses. We evaluated
the characteristics of the metric against other metrics and approaches that
have been proposed in the literature, and used it to (1) investigate the
similarities of search engine results, (2) the evolution of their affinity over
time, (3) what aspects of the results influence similarity, and (4) how the
metric differs over different kinds of search services. We found that Google
stands apart, but Bing and DuckDuckGo are largely indistinguishable from each
other.
Related papers
- SmartSearch: Process Reward-Guided Query Refinement for Search Agents [63.46067892354375]
Large language model (LLM)-based search agents have proven promising for addressing knowledge-intensive problems.<n>Existing works largely focus on optimizing the reasoning paradigms of search agents, yet the quality of intermediate search queries during reasoning remains overlooked.<n>We introduce SmartSearch, a framework built upon two key mechanisms to mitigate this issue.
arXiv Detail & Related papers (2026-01-08T12:39:05Z) - Unexpected Knowledge: Auditing Wikipedia and Grokipedia Search Recommendations [1.4323566945483497]
We provide the first comparative analysis of search engine in Wikipedia and Grokipedia.<n>We collect over 70,000 search engine results and examine their semantic alignment, overlap, and topical structure.<n>Our findings show that unexpected search engine outcomes are a common feature of both the platforms.
arXiv Detail & Related papers (2025-12-18T19:41:58Z) - The enshittification of online search? Privacy and quality of Google, Bing and Apple in coding advice [1.8528929583956726]
We evaluate the search quality of Google Search, Microsoft Bing, and Apple Search.<n>We use two independent metrics of search quality: 1) the number of trackers on the first search result, as a measure of privacy in web search, and 2) the average rank of the first Stack Overflow search result.<n>Our results suggest that the privacy of search results is higher on Bing than on Google and Apple. Similarly, the quality of coding advice -- as measured by the average rank of Stack Overflow -- was highest on Bing.
arXiv Detail & Related papers (2025-12-03T13:42:22Z) - Characterizing Web Search in The Age of Generative AI [7.059953211629231]
We compare Google, a traditional web search engine, with four generative search engines from two providers (Google and OpenAI)<n>Generative search engines vary in the degree to which they rely on internal knowledge contained within the model parameters v.s. external knowledge retrieved from the web.<n>Our results highlight the need for revisiting evaluation criteria for web search in the age of Generative AI.
arXiv Detail & Related papers (2025-10-13T16:04:03Z) - Generative Engine Optimization: How to Dominate AI Search [13.959899706228176]
Generative AI-powered search engines like ChatGPT, Perplexity, and Gemini are reshaping information retrieval.<n>This paper presents a comprehensive analysis of AI Search and traditional web search (Google)<n>Our key findings reveal that AI Search exhibit a systematic and overwhelming bias towards Earned media (third-party, authoritative sources) over Brand-owned and Social content.
arXiv Detail & Related papers (2025-09-10T18:29:18Z) - Digital Gatekeeping: An Audit of Search Engine Results shows tailoring of queries on the Israel-Palestine Conflict [3.9633322041283665]
We focus on the Israel-Palestine conflict and developed a privacy-protecting tool to audit the behavior of three search engines: DuckDuckGo, Google and Yahoo.
Our findings revealed significant customization based on location and browsing preferences, unlike previous studies that found only mild personalization for general topics.
queries related to the conflict were more customized than unrelated queries, and the results were not neutral concerning the conflict's portrayal.
arXiv Detail & Related papers (2025-02-06T18:05:30Z) - The Essence of the Essence from the Web:The Metasearch Engine [0.0]
Metasearch Engine comes into play by reducing the user burden by dispatching queries to multiple search engines in parallel.
These engines do not own a database of Web pages rather they send search terms to the databases maintained by the search engine companies.
In this paper, we describe the working of a typical metasearch engine and then present a comparative study of traditional search engines and metasearch engines on the basis of different parameters.
arXiv Detail & Related papers (2024-11-06T06:56:22Z) - Exploring Query Understanding for Amazon Product Search [62.53282527112405]
We study how query understanding-based ranking features influence the ranking process.
We propose a query understanding-based multi-task learning framework for ranking.
We present our studies and investigations using the real-world system on Amazon Search.
arXiv Detail & Related papers (2024-08-05T03:33:11Z) - MindSearch: Mimicking Human Minds Elicits Deep AI Searcher [50.68599514830046]
We introduce MindSearch to mimic the human minds in web information seeking and integration.<n>The framework can be instantiated by a simple yet effective LLM-based multi-agent framework.<n> MindSearch demonstrates significant improvement in the response quality in terms of depth and breadth.
arXiv Detail & Related papers (2024-07-29T17:12:40Z) - Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - A comparison of online search engine autocompletion in Google and Baidu [3.5016560416031886]
We study the characteristics of search auto-completions in two different linguistic and cultural contexts: Baidu and Google.
We find differences between the two search engines in the way they suppress or modify original queries.
Our study highlights the need for more refined, culturally sensitive moderation strategies in current language technologies.
arXiv Detail & Related papers (2024-05-03T08:17:04Z) - The Use of Generative Search Engines for Knowledge Work and Complex Tasks [26.583783763090732]
We analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search.
Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.
arXiv Detail & Related papers (2024-03-19T18:17:46Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations.
We conduct human evaluation to audit four popular generative search engines.
We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z) - NeuralSearchX: Serving a Multi-billion-parameter Reranker for
Multilingual Metasearch at a Low Cost [4.186775801993103]
We describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences.
We show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
arXiv Detail & Related papers (2022-10-26T16:36:53Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - The Matter of Chance: Auditing Web Search Results Related to the 2020
U.S. Presidential Primary Elections Across Six Search Engines [68.8204255655161]
We look at the text search results for "us elections", "donald trump", "joe biden" and "bernie sanders" queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex.
Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents.
arXiv Detail & Related papers (2021-05-03T11:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.