Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
- URL: http://arxiv.org/abs/2405.06545v1
- Date: Fri, 10 May 2024 15:40:50 GMT
- Title: Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval
- Authors: Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, Fan Mo,
- Abstract summary: Large language models (LLMs) have demonstrated remarkable capabilities across various domains.
Their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare.
We propose Self-Refinement-Enhanced Knowledge Graph Retrieval (Re-KGR) to augment the factuality of LLMs' responses.
- Score: 14.58181631462891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare. To address this issue, retrieving relevant facts from knowledge graphs (KGs) is considered a promising method. Existing KG-augmented approaches tend to be resource-intensive, requiring multiple rounds of retrieval and verification for each factoid, which impedes their application in real-world scenarios. In this study, we propose Self-Refinement-Enhanced Knowledge Graph Retrieval (Re-KGR) to augment the factuality of LLMs' responses with less retrieval efforts in the medical field. Our approach leverages the attribution of next-token predictive probability distributions across different tokens, and various model layers to primarily identify tokens with a high potential for hallucination, reducing verification rounds by refining knowledge triples associated with these tokens. Moreover, we rectify inaccurate content using retrieved knowledge in the post-processing stage, which improves the truthfulness of generated responses. Experimental results on a medical dataset demonstrate that our approach can enhance the factual capability of LLMs across various foundational models as evidenced by the highest scores on truthfulness.
Related papers
- Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding [92.32881381717594]
We introduce ALternate Contrastive Decoding (ALCD) to solve hallucination issues in medical information extraction tasks.
ALCD demonstrates significant improvements in resolving hallucination issues compared to conventional decoding methods.
arXiv Detail & Related papers (2024-10-21T07:19:19Z) - Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models [46.05020842978823]
Large Language Models (LLMs) have emerged as powerful tools to navigate this complex data landscape.
RAGGED is a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation.
arXiv Detail & Related papers (2024-07-17T07:44:18Z) - M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering [14.198330378235632]
We use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains.
Our multifaceted analysis of the performance of 15 LLMs uncovers success factors such as instruction tuning that lead to improved recall and comprehension.
We show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results.
We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented
arXiv Detail & Related papers (2024-06-06T02:43:21Z) - Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models [10.04914417538886]
Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment.
We propose a new textitDistill-Retrieve-Read framework instead of the previous textitRetrieve-then-Read.
arXiv Detail & Related papers (2024-04-27T13:11:42Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses [20.635793525894872]
We develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework to improve the accuracy and reliability of Large Language Models (LLMs)
Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs.
Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.
arXiv Detail & Related papers (2023-12-26T04:49:56Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.