Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions
- URL: http://arxiv.org/abs/2407.12468v2
- Date: Thu, 18 Jul 2024 10:11:09 GMT
- Title: Search Engines, LLMs or Both? Evaluating Information Seeking Strategies for Answering Health Questions
- Authors: Marcos Fernández-Pichel, Juan C. Pichel, David E. Losada,
- Abstract summary: We compare different web search engines, Large Language Models (LLMs) and retrieval-augmented (RAG) approaches.
We observed that the quality of webpages potentially responding to a health question does not decline as we navigate further down the ranked lists.
According to our evaluation, web engines are less accurate than LLMs in finding correct answers to health questions.
- Score: 3.8984586307450093
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Search engines have traditionally served as primary tools for information seeking. However, the new Large Language Models (LLMs) have recently demonstrated remarkable capabilities in multiple tasks and, specifically, their adoption as question answering systems is becoming increasingly prevalent. It is expected that LLM-based conversational systems and traditional web engines will continue to coexist in the future, supporting end users in various ways. But there is a need for more scientific research on the effectiveness of both types of systems in facilitating accurate information seeking. In this study, we focus on their merits in answering health questions. We conducted an extensive study comparing different web search engines, LLMs and retrieval-augmented (RAG) approaches. Our research reveals intriguing conclusions. For example, we observed that the quality of webpages potentially responding to a health question does not decline as we navigate further down the ranked lists. However, according to our evaluation, web engines are less accurate than LLMs in finding correct answers to health questions. On the other hand, LLMs are quite sensitive to the input prompts, and we also found out that RAG leads to highly effective information seeking methods.
Related papers
- Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations [40.498553309980764]
We study the interplay between verifiability and utility of information-sharing tools.
We find that users prefer search engines over large language models for high-stakes queries.
arXiv Detail & Related papers (2024-11-26T12:34:52Z) - ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries.
We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z) - Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses [32.49468716515915]
Large Language Model (LLM)-based applications are graduating from research prototypes to products serving millions of users.
A prominent example is the appearance of Answer Engines: LLM-based generative search engines supplanting traditional search engines.
arXiv Detail & Related papers (2024-10-15T00:50:31Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation [1.2839205715237014]
Large Language Models (LLMs) have the potential to significantly improve personal health management for chronic conditions.
LLMs generate responses based on patterns learned from diverse internet data.
Retrieval Augmented Generation (RAG) can help mitigate hallucinations and inaccuracies in RAG responses.
arXiv Detail & Related papers (2024-07-25T13:47:01Z) - Answering real-world clinical questions using large language model based systems [2.2605659089865355]
Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD)
We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability.
arXiv Detail & Related papers (2024-06-29T22:39:20Z) - When Search Engine Services meet Large Language Models: Visions and Challenges [53.32948540004658]
This paper conducts an in-depth examination of how integrating Large Language Models with search engines can mutually benefit both technologies.
We focus on two main areas: using search engines to improve LLMs (Search4LLM) and enhancing search engine functions using LLMs (LLM4Search)
arXiv Detail & Related papers (2024-06-28T03:52:13Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z) - Ranking Manipulation for Conversational Search Engines [7.958276719131612]
We study the impact of prompt injections on the ranking order of sources referenced by conversational search engines.
We present a tree-of-attacks-based jailbreaking technique which reliably promotes low-ranked products.
arXiv Detail & Related papers (2024-06-05T19:14:21Z) - CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval [52.134133938779776]
We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate.
Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn.
arXiv Detail & Related papers (2024-04-28T18:21:31Z) - JMLR: Joint Medical LLM and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability [8.476124605775976]
Large Language Models (LLMs) have demonstrated a remarkable potential in medical knowledge acquisition and question-answering.
LLMs can potentially hallucinate and yield factually incorrect outcomes, even with domain-specific pretraining.
We introduce JMLR (for Jointly trains LLM and information Retrieval) during the fine-tuning phase to address hallucinations.
arXiv Detail & Related papers (2024-02-27T21:01:41Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering [45.84961106102445]
Large Language Models (LLMs) often perform poorly on domain-specific tasks such as medical question answering (QA)
We propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then inject them into the LLM's query prompt.
Our retrieval-augmented Vicuna-7B model exhibited an accuracy improvement from 44.46% to 48.54%.
arXiv Detail & Related papers (2023-09-27T21:26:03Z) - Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering [48.17095875619711]
We present a system called LLMs Augmented with Medical Textbooks (LLM-AMT)
LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules.
We found that medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain.
arXiv Detail & Related papers (2023-09-05T13:39:38Z) - How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities.
We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.