Retrieval-augmented systems can be dangerous medical communicators
- URL: http://arxiv.org/abs/2502.14898v1
- Date: Tue, 18 Feb 2025 01:57:02 GMT
- Title: Retrieval-augmented systems can be dangerous medical communicators
- Authors: Lionel Wong, Ayman Ali, Raymond Xiong, Shannon Zeijang Shen, Yoon Kim, Monica Agrawal,
- Abstract summary: Patients have long sought health information online, and increasingly, they are turning to generative AI to answer their health-related queries.<n>Retrieval-augmented generation and citation grounding have been widely promoted as methods to reduce hallucinations and improve the accuracy of AI-generated responses.<n>This paper argues that even when these methods produce literally accurate content drawn from source documents sans hallucinations, they can still be highly misleading.
- Score: 21.371504193281226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patients have long sought health information online, and increasingly, they are turning to generative AI to answer their health-related queries. Given the high stakes of the medical domain, techniques like retrieval-augmented generation and citation grounding have been widely promoted as methods to reduce hallucinations and improve the accuracy of AI-generated responses and have been widely adopted into search engines. This paper argues that even when these methods produce literally accurate content drawn from source documents sans hallucinations, they can still be highly misleading. Patients may derive significantly different interpretations from AI-generated outputs than they would from reading the original source material, let alone consulting a knowledgeable clinician. Through a large-scale query analysis on topics including disputed diagnoses and procedure safety, we support our argument with quantitative and qualitative evidence of the suboptimal answers resulting from current systems. In particular, we highlight how these models tend to decontextualize facts, omit critical relevant sources, and reinforce patient misconceptions or biases. We propose a series of recommendations -- such as the incorporation of communication pragmatics and enhanced comprehension of source documents -- that could help mitigate these issues and extend beyond the medical domain.
Related papers
- Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.
We propose a novel approach utilizing structured medical reasoning.
Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine.
We define medical hallucination as any instance in which a model generates misleading medical content.
Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates.
These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z) - Iterative Tree Analysis for Medical Critics [5.617649111108429]
Iterative Tree Analysis (ITA) is designed to extract implicit claims from long medical texts and verify each claim through an iterative and adaptive tree-like reasoning process.
Our experiments demonstrate that ITA significantly outperforms previous methods in detecting factual inaccuracies in complex medical text verification tasks by 10%.
arXiv Detail & Related papers (2025-01-18T03:13:26Z) - Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models [1.03590082373586]
This paper conducts a scoping study of existing techniques for mitigating hallucinations in knowledge-based task in general and especially for medical domains.
Key methods covered in the paper include Retrieval-Augmented Generation (RAG)-based techniques, iterative feedback loops, supervised fine-tuning, and prompt engineering.
These techniques, while promising in general contexts, require further adaptation and optimization for the medical domain due to its unique demands for up-to-date, specialized knowledge and strict adherence to medical guidelines.
arXiv Detail & Related papers (2024-08-25T11:09:15Z) - MedInsight: A Multi-Source Context Augmentation Framework for Generating
Patient-Centric Medical Responses using Large Language Models [3.0874677990361246]
Large Language Models (LLMs) have shown impressive capabilities in generating human-like responses.
We propose MedInsight:a novel retrieval framework that augments LLM inputs with relevant background information.
Experiments on the MTSamples dataset validate MedInsight's effectiveness in generating contextually appropriate medical responses.
arXiv Detail & Related papers (2024-03-13T15:20:30Z) - A survey of recent methods for addressing AI fairness and bias in
biomedicine [48.46929081146017]
Artificial intelligence systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender.
We surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV)
We performed a literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords.
We reviewed other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness.
arXiv Detail & Related papers (2024-02-13T06:38:46Z) - Leveraging Generative AI for Clinical Evidence Summarization Needs to Ensure Trustworthiness [47.51360338851017]
Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence.
The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information.
Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task.
arXiv Detail & Related papers (2023-11-19T03:29:45Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - On the Combined Use of Extrinsic Semantic Resources for Medical
Information Search [0.0]
We develop a framework to highlight and expand head medical concepts in verbose medical queries.
We also build semantically enhanced inverted index documents.
To demonstrate the effectiveness of the proposed approach, we conducted several experiments over the CLEF 2014 dataset.
arXiv Detail & Related papers (2020-05-17T14:18:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.