Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
- URL: http://arxiv.org/abs/2406.15927v1
- Date: Sat, 22 Jun 2024 19:46:06 GMT
- Title: Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
- Authors: Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal,
- Abstract summary: Hallucinations present a major challenge to the practical adoption of Large Language Models.
Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations.
We propose SEPs, which directly approximate SE from the hidden states of a single generation.
- Score: 32.901839335074676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.
Related papers
- REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Quantifying Emergence in Large Language Models [31.608080868988825]
We propose a quantifiable solution for estimating emergence of LLMs.
Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level.
Our method demonstrates consistent behaviors across a suite of LMs under both in-context learning (ICL) and natural sentences.
arXiv Detail & Related papers (2024-05-21T09:12:20Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Distributional Inclusion Hypothesis and Quantifications: Probing for
Hypernymy in Functional Distributional Semantics [50.363809539842386]
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions.
We show that FDS models learn hypernymy on a restricted class of corpus that strictly follows the Distributional Inclusion Hypothesis (DIH)
arXiv Detail & Related papers (2023-09-15T11:28:52Z) - Interpreting and Improving Diffusion Models from an Optimization Perspective [4.5993996573872185]
We use this observation to interpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function.
We propose a new gradient-estimation sampler, generalizing DDIM using insights from our theoretical results.
arXiv Detail & Related papers (2023-06-08T00:56:33Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Lazy Estimation of Variable Importance for Large Neural Networks [22.95405462638975]
We propose a fast and flexible method for approximating the reduced model with important inferential guarantees.
We demonstrate our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
arXiv Detail & Related papers (2022-07-19T06:28:17Z) - Statistics and Deep Learning-based Hybrid Model for Interpretable
Anomaly Detection [0.0]
Hybrid methods have been shown to outperform pure statistical and pure deep learning methods at both forecasting tasks.
MES-LSTM is an interpretable anomaly detection model that overcomes these challenges.
arXiv Detail & Related papers (2022-02-25T14:17:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.