Search Engine Similarity Analysis: A Combined Content and Rankings
Approach
- URL: http://arxiv.org/abs/2011.00650v2
- Date: Fri, 6 Nov 2020 17:11:10 GMT
- Title: Search Engine Similarity Analysis: A Combined Content and Rankings
Approach
- Authors: Konstantina Dritsa, Thodoris Sotiropoulos, Haris Skarpetis, Panos
Louridas
- Abstract summary: We present an analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo.
We developed a new similarity metric that leverages both the content and the ranking of search responses.
We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.
- Score: 6.69087470775851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How different are search engines? The search engine wars are a favorite topic
of on-line analysts, as two of the biggest companies in the world, Google and
Microsoft, battle for prevalence of the web search space. Differences in search
engine popularity can be explained by their effectiveness or other factors,
such as familiarity with the most popular first engine, peer imitation, or
force of habit. In this work we present a thorough analysis of the affinity of
the two major search engines, Google and Bing, along with DuckDuckGo, which
goes to great lengths to emphasize its privacy-friendly credentials. To do so,
we collected search results using a comprehensive set of 300 unique queries for
two time periods in 2016 and 2019, and developed a new similarity metric that
leverages both the content and the ranking of search responses. We evaluated
the characteristics of the metric against other metrics and approaches that
have been proposed in the literature, and used it to (1) investigate the
similarities of search engine results, (2) the evolution of their affinity over
time, (3) what aspects of the results influence similarity, and (4) how the
metric differs over different kinds of search services. We found that Google
stands apart, but Bing and DuckDuckGo are largely indistinguishable from each
other.
Related papers
- Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - A comparison of online search engine autocompletion in Google and Baidu [3.5016560416031886]
We study the characteristics of search auto-completions in two different linguistic and cultural contexts: Baidu and Google.
We find differences between the two search engines in the way they suppress or modify original queries.
Our study highlights the need for more refined, culturally sensitive moderation strategies in current language technologies.
arXiv Detail & Related papers (2024-05-03T08:17:04Z) - The Use of Generative Search Engines for Knowledge Work and Complex Tasks [26.583783763090732]
We analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search.
Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine.
arXiv Detail & Related papers (2024-03-19T18:17:46Z) - GEO: Generative Engine Optimization [50.45232692363787]
We formalize the unified framework of generative engines (GEs)
GEs use large language models (LLMs) to gather and summarize information to answer user queries.
Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them.
We introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in generative engine responses.
arXiv Detail & Related papers (2023-11-16T10:06:09Z) - User Attitudes to Content Moderation in Web Search [49.1574468325115]
We examine the levels of support for different moderation practices applied to potentially misleading and/or potentially offensive content in web search.
We find that the most supported practice is informing users about potentially misleading or offensive content, and the least supported one is the complete removal of search results.
More conservative users and users with lower levels of trust in web search results are more likely to be against content moderation in web search.
arXiv Detail & Related papers (2023-10-05T10:57:15Z) - Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations.
We conduct human evaluation to audit four popular generative search engines.
We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z) - NeuralSearchX: Serving a Multi-billion-parameter Reranker for
Multilingual Metasearch at a Low Cost [4.186775801993103]
We describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences.
We show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
arXiv Detail & Related papers (2022-10-26T16:36:53Z) - Search engine effects on news consumption: ranking and
representativeness outweigh familiarity in news selection [0.0]
We analyze three competing factors, two algorithmic (ranking and representativeness) and one psychological (familiarity) that could influence the selection of news articles that appear in search results.
Our results demonstrate the steering power of the algorithmic factors on news consumption as compared to familiarity.
We confirm that Google Search drives individuals to unfamiliar sources and find that it increases the diversity of the political audience to news sources.
arXiv Detail & Related papers (2022-06-17T06:30:56Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - The Matter of Chance: Auditing Web Search Results Related to the 2020
U.S. Presidential Primary Elections Across Six Search Engines [68.8204255655161]
We look at the text search results for "us elections", "donald trump", "joe biden" and "bernie sanders" queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex.
Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents.
arXiv Detail & Related papers (2021-05-03T11:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.