A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation
- URL: http://arxiv.org/abs/2307.03987v2
- Date: Sat, 12 Aug 2023 14:57:37 GMT
- Title: A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation
- Authors: Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu
- Abstract summary: Large language models often tend to 'hallucinate' which critically hampers their reliability.
We propose an approach that actively detects and mitigates hallucinations during the generation process.
We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
- Score: 76.34411067299331
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently developed large language models have achieved remarkable success in
generating fluent and coherent text. However, these models often tend to
'hallucinate' which critically hampers their reliability. In this work, we
address this crucial problem and propose an approach that actively detects and
mitigates hallucinations during the generation process. Specifically, we first
identify the candidates of potential hallucination leveraging the model's logit
output values, check their correctness through a validation procedure, mitigate
the detected hallucinations, and then continue with the generation process.
Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article
generation task', we first demonstrate the individual efficacy of our detection
and mitigation techniques. Specifically, the detection technique achieves a
recall of ~88% and the mitigation technique successfully mitigates 57.6% of the
correctly detected hallucinations. Importantly, our mitigation technique does
not introduce new hallucinations even in the case of incorrectly detected
hallucinations, i.e., false positives. Then, we show that the proposed active
detection and mitigation approach successfully reduces the hallucinations of
the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the
effectiveness and wide applicability of our approach through additional studies
including performance on different types of questions (multi-hop and false
premise questions) and with another LLM from a different model family (Vicuna).
In summary, our work contributes to improving the reliability and
trustworthiness of large language models, a crucial step en route to enabling
their widespread adoption in real-world applications.
Related papers
- HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx.
The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z) - Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models [20.175106988135454]
We introduce a novel Attention-Guided SElf-Reflection (AGSER) approach for zero-shot hallucination detection in Large Language Models (LLMs)
The AGSER method utilizes attention contributions to categorize the input query into attentive and non-attentive queries.
In addition to its efficacy in detecting hallucinations, AGSER notably reduces computational overhead, requiring only three passes through the LLM and utilizing two sets of tokens.
arXiv Detail & Related papers (2025-01-17T07:30:01Z) - Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation [21.31915988262898]
We introduce a novel framework, the Active Retrieval-Augmented large vision-language model (ARA), specifically designed to address hallucinations.
Our empirical observations suggest that by utilizing fitting retrieval mechanisms and timing the retrieval judiciously, we can effectively mitigate the hallucination problem.
arXiv Detail & Related papers (2024-08-01T13:38:58Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs.
Even GPT-3.5 produces factual outputs less than 25% of the time.
This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Zero-Resource Hallucination Prevention for Large Language Models [45.4155729393135]
"Hallucination" refers to instances where large language models (LLMs) generate factually inaccurate or ungrounded information.
We introduce a novel pre-language self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction.
We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-09-06T01:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.