Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
- URL: http://arxiv.org/abs/2410.22349v1
- Date: Tue, 15 Oct 2024 00:50:31 GMT
- Title: Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
- Authors: Pranav Narayanan Venkit, Philippe Laban, Yilun Zhou, Yixin Mao, Chien-Sheng Wu,
- Abstract summary: Large Language Model (LLM)-based applications are graduating from research prototypes to products serving millions of users.
A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines.
- Score: 32.49468716515915
- License:
- Abstract: Large Language Model (LLM)-based applications are graduating from research prototypes to products serving millions of users, influencing how people write and consume information. A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines. Answer engines not only retrieve relevant sources to a user query but synthesize answer summaries that cite the sources. To understand these systems' limitations, we first conducted a study with 21 participants, evaluating interactions with answer vs. traditional search engines and identifying 16 answer engine limitations. From these insights, we propose 16 answer engine design recommendations, linked to 8 metrics. An automated evaluation implementing our metrics on three popular engines (You.com, Perplexity.ai, BingChat) quantifies common limitations (e.g., frequent hallucination, inaccurate citation) and unique features (e.g., variation in answer confidence), with results mirroring user study insights. We release our Answer Engine Evaluation benchmark (AEE) to facilitate transparent evaluation of LLM-based applications.
Related papers
- The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations [40.498553309980764]
We study the interplay between verifiability and utility of information-sharing tools.
We find that users prefer search engines over large language models for high-stakes queries.
arXiv Detail & Related papers (2024-11-26T12:34:52Z) - Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions [3.8984586307450093]
We compare different web search engines, Large Language Models (LLMs) and retrieval-augmented (RAG) approaches.
We observed that the quality of webpages potentially responding to a health question does not decline as we navigate further down the ranked lists.
According to our evaluation, web engines are less accurate than LLMs in finding correct answers to health questions.
arXiv Detail & Related papers (2024-07-17T10:40:39Z) - Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models [21.115495457454365]
uRAG is a framework with a unified retrieval engine that serves multiple downstream retrieval-augmented generation (RAG) systems.
We build a large-scale experimentation ecosystem consisting of 18 RAG systems that engage in training and 18 unknown RAG systems that use the uRAG as the new users of the search engine.
arXiv Detail & Related papers (2024-04-30T19:51:37Z) - Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions [89.35345649303451]
Generative search engines have the potential to transform how people seek information online.
But generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate.
Retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system.
arXiv Detail & Related papers (2024-02-25T11:22:19Z) - GEO: Generative Engine Optimization [50.45232692363787]
We formalize the unified framework of generative engines (GEs)
GEs use large language models (LLMs) to gather and summarize information to answer user queries.
Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them.
We introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in generative engine responses.
arXiv Detail & Related papers (2023-11-16T10:06:09Z) - Automatic Evaluation of Attribution by Large Language Models [24.443271739599194]
We investigate the automatic evaluation of attribution given by large language models (LLMs)
We begin by defining different types of attribution errors, and then explore two approaches for automatic evaluation.
We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.
arXiv Detail & Related papers (2023-05-10T16:58:33Z) - Evaluating Verifiability in Generative Search Engines [70.59477647085387]
Generative search engines directly generate responses to user queries, along with in-line citations.
We conduct human evaluation to audit four popular generative search engines.
We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
arXiv Detail & Related papers (2023-04-19T17:56:12Z) - Search-Engine-augmented Dialogue Response Generation with Cheaply
Supervised Query Production [98.98161995555485]
We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation.
As the core module, a query producer is used to generate queries from a dialogue context to interact with a search engine.
Experiments show that our query producer can achieve R@1 and R@5 rates of 62.4% and 74.8% for retrieving gold knowledge.
arXiv Detail & Related papers (2023-02-16T01:58:10Z) - Brain-inspired Search Engine Assistant based on Knowledge Graph [53.89429854626489]
DeveloperBot is a brain-inspired search engine assistant named on knowledge graph.
It constructs a multi-layer query graph by splitting a complex multi-constraint query into several ordered constraints.
It then models the constraint reasoning process as subgraph search process inspired by the spreading activation model of cognitive science.
arXiv Detail & Related papers (2020-12-25T06:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.