Evaluating Verifiability in Generative Search Engines
- URL: http://arxiv.org/abs/2304.09848v2
- Date: Mon, 23 Oct 2023 19:11:38 GMT
- Title: Evaluating Verifiability in Generative Search Engines
- Authors: Nelson F. Liu and Tianyi Zhang and Percy Liang
- Abstract summary: Generative search engines directly generate responses to user queries, along with in-line citations.
We conduct human evaluation to audit four popular generative search engines.
We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations.
- Score: 70.59477647085387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative search engines directly generate responses to user queries, along
with in-line citations. A prerequisite trait of a trustworthy generative search
engine is verifiability, i.e., systems should cite comprehensively (high
citation recall; all statements are fully supported by citations) and
accurately (high citation precision; every cite supports its associated
statement). We conduct human evaluation to audit four popular generative search
engines -- Bing Chat, NeevaAI, perplexity.ai, and YouChat -- across a diverse
set of queries from a variety of sources (e.g., historical Google user queries,
dynamically-collected open-ended questions on Reddit, etc.). We find that
responses from existing generative search engines are fluent and appear
informative, but frequently contain unsupported statements and inaccurate
citations: on average, a mere 51.5% of generated sentences are fully supported
by citations and only 74.5% of citations support their associated sentence. We
believe that these results are concerningly low for systems that may serve as a
primary tool for information-seeking users, especially given their facade of
trustworthiness. We hope that our results further motivate the development of
trustworthy generative search engines and help researchers and users better
understand the shortcomings of existing commercial systems.
Related papers
- Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses [32.49468716515915]
Large Language Model (LLM)-based applications are graduating from research prototypes to products serving millions of users.
A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines.
arXiv Detail & Related papers (2024-10-15T00:50:31Z) - Generative AI Search Engines as Arbiters of Public Knowledge: An Audit of Bias and Authority [2.860575804107195]
This paper reports on an audit study of generative AI systems (ChatGPT, Bing Chat, and Perplexity) which investigates how these new search engines construct responses.
We collected system responses using a set of 48 authentic queries for 4 topics over a 7-day period and analyzed the data using sentiment analysis, inductive coding and source classification.
Results provide an overview of the nature of system responses across these systems and provide evidence of sentiment bias based on the queries and topics, and commercial and geographic bias in sources.
arXiv Detail & Related papers (2024-05-22T22:09:32Z) - Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions [89.35345649303451]
Generative search engines have the potential to transform how people seek information online.
But generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate.
Retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system.
arXiv Detail & Related papers (2024-02-25T11:22:19Z) - Evaluating Generative Ad Hoc Information Retrieval [58.800799175084286]
generative retrieval systems often directly return a grounded generated text as a response to a query.
Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval.
arXiv Detail & Related papers (2023-11-08T14:05:00Z) - Social Commonsense-Guided Search Query Generation for Open-Domain
Knowledge-Powered Conversations [66.16863141262506]
We present a novel approach that focuses on generating internet search queries guided by social commonsense.
Our proposed framework addresses passive user interactions by integrating topic tracking, commonsense response generation and instruction-driven query generation.
arXiv Detail & Related papers (2023-10-22T16:14:56Z) - Enabling Large Language Models to Generate Text with Citations [37.64884969997378]
Large language models (LLMs) have emerged as a widely-used tool for information seeking.
Our aim is to allow LLMs to generate text with citations, improving their factual correctness and verifiability.
We propose ALCE, the first benchmark for Automatic LLMs' Citation Evaluation.
arXiv Detail & Related papers (2023-05-24T01:53:49Z) - WebCPM: Interactive Web Search for Chinese Long-form Question Answering [104.676752359777]
Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses.
We introduce WebCPM, the first Chinese LFQA dataset.
We collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions.
arXiv Detail & Related papers (2023-05-11T14:47:29Z) - Search-Engine-augmented Dialogue Response Generation with Cheaply
Supervised Query Production [98.98161995555485]
We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation.
As the core module, a query producer is used to generate queries from a dialogue context to interact with a search engine.
Experiments show that our query producer can achieve R@1 and R@5 rates of 62.4% and 74.8% for retrieving gold knowledge.
arXiv Detail & Related papers (2023-02-16T01:58:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.