Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
- URL: http://arxiv.org/abs/2406.15927v1
- Date: Sat, 22 Jun 2024 19:46:06 GMT
- Title: Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
- Authors: Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal,
- Abstract summary: Hallucinations present a major challenge to the practical adoption of Large Language Models.
Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations.
We propose SEPs, which directly approximate SE from the hidden states of a single generation.
- Score: 32.901839335074676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.
Related papers
- HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses [0.12499537119440242]
This paper proposes an explanation enhanced hallucination-detection model, coined as HuDEx.
The proposed model provides a novel approach to integrate detection with explanations, and enable both users and the LLM itself to understand and reduce errors.
arXiv Detail & Related papers (2025-02-12T04:17:02Z) - Enhancing Hallucination Detection through Noise Injection [9.582929634879932]
Large Language Models (LLMs) are prone to generating plausible yet incorrect responses, known as hallucinations.
We show that detection can be improved significantly by taking into account model uncertainty in the Bayesian sense.
We propose a very simple and efficient approach that perturbs an appropriate subset of model parameters, or equivalently hidden unit activations, during sampling.
arXiv Detail & Related papers (2025-02-06T06:02:20Z) - AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
We introduce AutoElicit to extract knowledge from large language models and construct priors for predictive models.
We show these priors are informative and can be refined using natural language.
We find that AutoElicit yields priors that can substantially reduce error over uninformative priors, using fewer labels, and consistently outperform in-context learning.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy [31.05551799523973]
Large Language Models (LLMs) are known to hallucinate, whereby they generate plausible but inaccurate text.
This phenomenon poses significant risks in critical applications, such as medicine or law, necessitating robust hallucination mitigation strategies.
We propose fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels.
arXiv Detail & Related papers (2024-10-22T17:54:03Z) - LoGU: Long-form Generation with Uncertainty Expressions [49.76417603761989]
We introduce the task of Long-form Generation with Uncertainty(LoGU)
We identify two key challenges: Uncertainty Suppression and Uncertainty Misalignment.
Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims.
Experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
arXiv Detail & Related papers (2024-10-18T09:15:35Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Distributional Inclusion Hypothesis and Quantifications: Probing for
Hypernymy in Functional Distributional Semantics [50.363809539842386]
Functional Distributional Semantics (FDS) models the meaning of words by truth-conditional functions.
We show that FDS models learn hypernymy on a restricted class of corpus that strictly follows the Distributional Inclusion Hypothesis (DIH)
arXiv Detail & Related papers (2023-09-15T11:28:52Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - Lazy Estimation of Variable Importance for Large Neural Networks [22.95405462638975]
We propose a fast and flexible method for approximating the reduced model with important inferential guarantees.
We demonstrate our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
arXiv Detail & Related papers (2022-07-19T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.