Related papers: Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models

Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models

URL: http://arxiv.org/abs/2408.13808v1
Date: Sun, 25 Aug 2024 11:09:15 GMT
Title: Towards Reliable Medical Question Answering: Techniques and Challenges in Mitigating Hallucinations in Language Models
Authors: Duy Khoa Pham, Bao Quoc Vo,
Abstract summary: This paper conducts a scoping study of existing techniques for mitigating hallucinations in knowledge-based task in general and especially for medical domains. Key methods covered in the paper include Retrieval-Augmented Generation (RAG)-based techniques, iterative feedback loops, supervised fine-tuning, and prompt engineering. These techniques, while promising in general contexts, require further adaptation and optimization for the medical domain due to its unique demands for up-to-date, specialized knowledge and strict adherence to medical guidelines.
Score: 1.03590082373586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid advancement of large language models (LLMs) has significantly impacted various domains, including healthcare and biomedicine. However, the phenomenon of hallucination, where LLMs generate outputs that deviate from factual accuracy or context, poses a critical challenge, especially in high-stakes domains. This paper conducts a scoping study of existing techniques for mitigating hallucinations in knowledge-based task in general and especially for medical domains. Key methods covered in the paper include Retrieval-Augmented Generation (RAG)-based techniques, iterative feedback loops, supervised fine-tuning, and prompt engineering. These techniques, while promising in general contexts, require further adaptation and optimization for the medical domain due to its unique demands for up-to-date, specialized knowledge and strict adherence to medical guidelines. Addressing these challenges is crucial for developing trustworthy AI systems that enhance clinical decision-making and patient safety as well as accuracy of biomedical scientific research.

Related papers

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Differentiating hype from practical applications of large language models in medicine - a primer for healthcare professionals [0.0]
Machine learning and artificial intelligence techniques, including large language models (LLMs), have been promised to deliver on healthcare innovation.<n>LLMs have no understanding of objective truth that is based in reality.<n>They also represent real risks to the disclosure of protected information when used by clinicians and researchers.
arXiv Detail & Related papers (2025-07-25T16:40:17Z)
Prompt Mechanisms in Medical Imaging: A Comprehensive Survey [18.072753363565322]
Deep learning offers transformative potential in medical imaging.<n>Yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization.<n>Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models.
arXiv Detail & Related papers (2025-06-28T03:06:25Z)
Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications [45.935798913942904]
We propose an innovative framework that combines structured biomedical knowledge with large language models (LLMs) Our system develops a thorough knowledge graph by identifying and refining causal relationships and named entities from medical abstracts related to age-related macular degeneration (AMD) Using a vector-based retrieval process and a locally deployed language model, our framework produces responses that are both contextually relevant and verifiable, with direct references to clinical evidence.
arXiv Detail & Related papers (2025-02-16T12:52:28Z)
From large language models to multimodal AI: A scoping review on the potential of generative AI in medicine [40.23383597339471]
multimodal AI is capable of integrating diverse data modalities, including imaging, text, and structured data, within a single model. This scoping review explores the evolution of multimodal AI, highlighting its methods, applications, datasets, and evaluation in clinical settings. Our findings underscore a shift from unimodal to multimodal approaches, driving innovations in diagnostic support, medical report generation, drug discovery, and conversational AI.
arXiv Detail & Related papers (2025-02-13T11:57:51Z)
A Review on Scientific Knowledge Extraction using Large Language Models in Biomedical Sciences [1.8308043661908204]
This paper reviews the state-of-the-art applications of large language models (LLMs) in the biomedical domain. LLMs demonstrate remarkable potential, but significant challenges remain, including issues related to hallucinations, contextual understanding, and the ability to generalize. We aim to improve access to medical literature and facilitate meaningful discoveries in healthcare.
arXiv Detail & Related papers (2024-12-04T18:26:13Z)
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets. Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z)
SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques [9.146311285410631]
Mental health issues significantly impact individuals' daily lives, yet many do not receive the help they need even with available online resources. This study aims to provide diverse, accessible, stigma-free, personalized, and real-time mental health support through cutting-edge AI technologies.
arXiv Detail & Related papers (2024-10-17T22:04:32Z)
Toward Large Language Models as a Therapeutic Tool: Comparing Prompting Techniques to Improve GPT-Delivered Problem-Solving Therapy [6.952909762512736]
We examine the effects of prompt engineering to guide Large Language Models (LLMs) in delivering parts of a Problem-Solving Therapy session via text. We demonstrate that the models' capability to deliver protocolized therapy can be improved with the proper use of prompt engineering methods.
arXiv Detail & Related papers (2024-08-27T17:25:16Z)
A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions [31.04135502285516]
Large language models (LLMs) have received substantial attention due to their impressive capabilities for generating and understanding human-level language. LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services.
arXiv Detail & Related papers (2024-06-06T03:15:13Z)
Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration [54.76511683427566]
This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures.
arXiv Detail & Related papers (2024-02-16T05:16:20Z)
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering [25.569980942498347]
ChatGPT explores a strategic blueprint of question answering (QA) in delivering medical diagnosis, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms.
arXiv Detail & Related papers (2024-01-15T07:21:16Z)
Towards Mitigating Hallucination in Large Language Models via Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering. The main challenges that can be formulated as ML problems are classified into the three main categories. For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.