A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation
- URL: http://arxiv.org/abs/2307.03987v2
- Date: Sat, 12 Aug 2023 14:57:37 GMT
- Title: A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of
LLMs by Validating Low-Confidence Generation
- Authors: Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, and Dong Yu
- Abstract summary: Large language models often tend to 'hallucinate' which critically hampers their reliability.
We propose an approach that actively detects and mitigates hallucinations during the generation process.
We show that the proposed active detection and mitigation approach successfully reduces the hallucinations of the GPT-3.5 model from 47.5% to 14.5% on average.
- Score: 76.34411067299331
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently developed large language models have achieved remarkable success in
generating fluent and coherent text. However, these models often tend to
'hallucinate' which critically hampers their reliability. In this work, we
address this crucial problem and propose an approach that actively detects and
mitigates hallucinations during the generation process. Specifically, we first
identify the candidates of potential hallucination leveraging the model's logit
output values, check their correctness through a validation procedure, mitigate
the detected hallucinations, and then continue with the generation process.
Through extensive experiments with GPT-3.5 (text-davinci-003) on the 'article
generation task', we first demonstrate the individual efficacy of our detection
and mitigation techniques. Specifically, the detection technique achieves a
recall of ~88% and the mitigation technique successfully mitigates 57.6% of the
correctly detected hallucinations. Importantly, our mitigation technique does
not introduce new hallucinations even in the case of incorrectly detected
hallucinations, i.e., false positives. Then, we show that the proposed active
detection and mitigation approach successfully reduces the hallucinations of
the GPT-3.5 model from 47.5% to 14.5% on average. We further demonstrate the
effectiveness and wide applicability of our approach through additional studies
including performance on different types of questions (multi-hop and false
premise questions) and with another LLM from a different model family (Vicuna).
In summary, our work contributes to improving the reliability and
trustworthiness of large language models, a crucial step en route to enabling
their widespread adoption in real-world applications.
Related papers
- Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation [21.31915988262898]
We introduce a novel framework, the Active Retrieval-Augmented large vision-language model (ARA), specifically designed to address hallucinations.
Our empirical observations suggest that by utilizing fitting retrieval mechanisms and timing the retrieval judiciously, we can effectively mitigate the hallucination problem.
arXiv Detail & Related papers (2024-08-01T13:38:58Z) - KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs)
It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism.
Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z) - InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers [12.427232123205671]
Large Language Models (LLMs) invent answers that sound realistic, yet drift away from factual truth.
We present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios.
We observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.
arXiv Detail & Related papers (2024-03-05T11:50:01Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs.
Even GPT-3.5 produces factual outputs less than 25% of the time.
This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z) - A New Benchmark and Reverse Validation Method for Passage-level
Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks.
We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion.
We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Zero-Resource Hallucination Prevention for Large Language Models [45.4155729393135]
"Hallucination" refers to instances where large language models (LLMs) generate factually inaccurate or ungrounded information.
We introduce a novel pre-language self-evaluation technique, referred to as SELF-FAMILIARITY, which focuses on evaluating the model's familiarity with the concepts present in the input instruction.
We validate SELF-FAMILIARITY across four different large language models, demonstrating consistently superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-09-06T01:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.