Related papers: Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

URL: http://arxiv.org/abs/2406.15927v1
Date: Sat, 22 Jun 2024 19:46:06 GMT
Title: Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Authors: Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal,
Abstract summary: Hallucinations present a major challenge to the practical adoption of Large Language Models. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. We propose SEPs, which directly approximate SE from the hidden states of a single generation.
Score: 32.901839335074676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

Related papers

Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models [0.0]
We propose Counterfactual Probing, a novel approach for detecting and mitigating hallucinations in large language models.<n>Our method dynamically generates counterfactual statements that appear plausible but contain subtle factual errors, then evaluates the model's sensitivity to these perturbations.
arXiv Detail & Related papers (2025-08-03T17:29:48Z)
Pretrained LLMs Learn Multiple Types of Uncertainty [23.807232455808613]
Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks.<n>In this work, we study how well LLMs capture uncertainty, without explicitly being trained for that.<n>We show that, if considering uncertainty as a linear concept in the model's latent space, it might indeed be captured, even after only pretraining.
arXiv Detail & Related papers (2025-05-27T14:06:15Z)
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation [78.78421340836915]
We systematically investigate reference-free hallucination detection in open-domain long-form responses.<n>Our findings reveal that internal states are insufficient for reliably distinguishing between factual and hallucinated content.<n>We introduce a new paradigm, named RATE-FT, that augments fine-tuning with an auxiliary task for the model to jointly learn with the main task of hallucination detection.
arXiv Detail & Related papers (2025-05-18T07:10:03Z)
One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images. Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting'' We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z)
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx. The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z)
Enhancing Hallucination Detection through Noise Injection [9.582929634879932]
Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations. We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense. We propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling.
arXiv Detail & Related papers (2025-02-06T06:02:20Z)
Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy [31.05551799523973]
Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text. This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies. We propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels.
arXiv Detail & Related papers (2024-10-22T17:54:03Z)
LoGU: Long-form Generation with Uncertainty Expressions [49.76417603761989]
We introduce the task of Long-form Generation with Uncertainty(LoGU) We identify two key challenges: Uncertainty Suppression and Uncertainty Misalignment. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. Experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
arXiv Detail & Related papers (2024-10-18T09:15:35Z)
REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Quantifying Emergence in Large Language Models [31.608080868988825]
We propose a quantifiable solution for estimating emergence of LLMs. Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level. Our method demonstrates consistent behaviors across a suite of LMs under both in-context learning (ICL) and natural sentences.
arXiv Detail & Related papers (2024-05-21T09:12:20Z)
Distributional Inclusion Hypothesis and Quantifications: Probing for Hypernymy in Functional Distributional Semantics [50.363809539842386]
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions. We show that FDS models learn hypernymy on a restricted class of corpus that strictly follows the Distributional Inclusion Hypothesis (DIH)
arXiv Detail & Related papers (2023-09-15T11:28:52Z)
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings. We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models. Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z)
Lazy Estimation of Variable Importance for Large Neural Networks [22.95405462638975]
We propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We demonstrate our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
arXiv Detail & Related papers (2022-07-19T06:28:17Z)
Statistics and Deep Learning-based Hybrid Model for Interpretable Anomaly Detection [0.0]
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at both forecasting tasks. MES-LSTM is an interpretable anomaly detection model that overcomes these challenges.
arXiv Detail & Related papers (2022-02-25T14:17:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.