HaloScope: Harnessing Unlabeled LLM Generations for Hallucination
Detection
- URL: http://arxiv.org/abs/2409.17504v1
- Date: Thu, 26 Sep 2024 03:22:09 GMT
- Title: HaloScope: Harnessing Unlabeled LLM Generations for Hallucination
Detection
- Authors: Xuefeng Du, Chaowei Xiao, Yixuan Li
- Abstract summary: HaloScope is a novel learning framework that leverages unlabeled large language models in the wild for hallucination detection.
We present an automated membership estimation score for distinguishing between truthful and untruthful generations within unlabeled mixture data.
Experiments show that HaloScope can achieve superior hallucination detection performance, outperforming the competitive rivals by a significant margin.
- Score: 55.596406899347926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The surge in applications of large language models (LLMs) has prompted
concerns about the generation of misleading or fabricated information, known as
hallucinations. Therefore, detecting hallucinations has become critical to
maintaining trust in LLM-generated content. A primary challenge in learning a
truthfulness classifier is the lack of a large amount of labeled truthful and
hallucinated data. To address the challenge, we introduce HaloScope, a novel
learning framework that leverages the unlabeled LLM generations in the wild for
hallucination detection. Such unlabeled data arises freely upon deploying LLMs
in the open world, and consists of both truthful and hallucinated information.
To harness the unlabeled data, we present an automated membership estimation
score for distinguishing between truthful and untruthful generations within
unlabeled mixture data, thereby enabling the training of a binary truthfulness
classifier on top. Importantly, our framework does not require extra data
collection and human annotations, offering strong flexibility and practicality
for real-world applications. Extensive experiments show that HaloScope can
achieve superior hallucination detection performance, outperforming the
competitive rivals by a significant margin. Code is available at
https://github.com/deeplearningwisc/haloscope.
Related papers
- The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States [0.5573267589690007]
We focus on hallucinations involving information not used in training, which we determine by using recency to ensure the information emerged after a cut-off date.
This study investigates these hallucinations by detecting them at sentence level using different internal states of various language models.
Our results show that IAVs detect hallucinations as effectively as CEVs and reveal that answerable and unanswerable prompts are encoded differently as separate classifiers.
arXiv Detail & Related papers (2024-12-22T15:08:24Z) - LLM Hallucination Reasoning with Zero-shot Knowledge Test [10.306443936136425]
We introduce a new task, Hallucination Reasoning, which classifies LLM-generated text into one of three categories: aligned, misaligned, and fabricated.
Our experiments conducted on new datasets demonstrate the effectiveness of our method in hallucination reasoning.
arXiv Detail & Related papers (2024-11-14T18:55:26Z) - Mitigating Entity-Level Hallucination in Large Language Models [11.872916697604278]
This paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD) as a novel method to detect and mitigate hallucinations in Large Language Models (LLMs)
Experiment results show that DRAD demonstrates superior performance in both detecting and mitigating hallucinations in LLMs.
arXiv Detail & Related papers (2024-07-12T16:47:34Z) - Exploring and Evaluating Hallucinations in LLM-Powered Code Generation [14.438161741833687]
Large Language Models (LLMs) produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with factual knowledge.
Existing work mainly focuses on investing the hallucination in the domain of natural language generation.
We conduct a thematic analysis of the LLM-generated code to summarize and categorize the hallucinations present in it.
We propose HalluCode, a benchmark for evaluating the performance of code LLMs in recognizing hallucinations.
arXiv Detail & Related papers (2024-04-01T07:31:45Z) - Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models [68.91592125175787]
Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs)
We present Rowen, a novel approach that enhances LLMs with a selective retrieval augmentation process tailored to address hallucinations.
arXiv Detail & Related papers (2024-02-16T11:55:40Z) - Knowledge Verification to Nip Hallucination in the Bud [69.79051730580014]
We demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs.
We propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge.
We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales.
arXiv Detail & Related papers (2024-01-19T15:39:49Z) - Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus [99.33091772494751]
Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.
LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations.
We propose a novel reference-free, uncertainty-based method for detecting hallucinations in LLMs.
arXiv Detail & Related papers (2023-11-22T08:39:17Z) - HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data [102.56792377624927]
hallucinations inherent in machine-generated data remain under-explored.
We present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm.
Our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA.
arXiv Detail & Related papers (2023-11-22T04:52:58Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.