Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity
- URL: http://arxiv.org/abs/2506.00245v1
- Date: Fri, 30 May 2025 21:21:05 GMT
- Title: Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity
- Authors: Dang Nguyen, Ali Payani, Baharan Mirzasoleiman,
- Abstract summary: Hallucination in large language models can be detected by assessing the uncertainty of model outputs, typically measured using entropy.<n>We propose a simple black-box uncertainty quantification method inspired by nearest neighbor estimates of entropy.<n>Our approach can also be easily extended to white-box settings by incorporating token probabilities.
- Score: 15.16188621701658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucination in large language models (LLMs) can be detected by assessing the uncertainty of model outputs, typically measured using entropy. Semantic entropy (SE) enhances traditional entropy estimation by quantifying uncertainty at the semantic cluster level. However, as modern LLMs generate longer one-sentence responses, SE becomes less effective because it overlooks two crucial factors: intra-cluster similarity (the spread within a cluster) and inter-cluster similarity (the distance between clusters). To address these limitations, we propose a simple black-box uncertainty quantification method inspired by nearest neighbor estimates of entropy. Our approach can also be easily extended to white-box settings by incorporating token probabilities. Additionally, we provide theoretical results showing that our method generalizes semantic entropy. Extensive empirical results demonstrate its effectiveness compared to semantic entropy across two recent LLMs (Phi3 and Llama3) and three common text generation tasks: question answering, text summarization, and machine translation. Our code is available at https://github.com/BigML-CS-UCLA/SNNE.
Related papers
- Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes [60.75226150503949]
We propose a Bayesian framework that quantifies semantic uncertainty by analyzing the geometric structure of answer embeddings.<n>S GPU maps generated answers into a dense semantic space, computes the Gram matrix of their semantic embeddings, and summarizes their semantic configuration.<n>We show that S GPU transfers across models and modalities, indicating that its spectral representation captures general patterns of semantic uncertainty.
arXiv Detail & Related papers (2025-12-16T08:15:24Z) - Probabilities Are All You Need: A Probability-Only Approach to Uncertainty Estimation in Large Language Models [13.41454380481593]
Uncertainty estimation, often using predictive entropy estimation, is key to addressing this issue.<n>This paper proposes an efficient, training-free uncertainty estimation method that approximates predictive entropy using the responses' top-$K$ probabilities.
arXiv Detail & Related papers (2025-11-10T23:31:43Z) - Estimating Semantic Alphabet Size for LLM Uncertainty Quantification [12.029394705620724]
We propose a modified semantic alphabet size estimator for semantic entropy estimation.<n>Using it to adjust discrete semantic entropy for sample coverage results in more accurate semantic entropy estimation.<n>Our proposed alphabet size estimator flags incorrect LLM responses as well or better than recent top-performing approaches.
arXiv Detail & Related papers (2025-09-17T23:16:39Z) - Semantic Energy: Detecting LLM Hallucination Beyond Entropy [106.92072182161712]
Large Language Models (LLMs) are being increasingly deployed in real-world applications, but they remain susceptible to hallucinations.<n>Uncertainty estimation is a feasible approach to detect such hallucinations.<n>We introduce Semantic Energy, a novel uncertainty estimation framework.
arXiv Detail & Related papers (2025-08-20T07:33:50Z) - Inv-Entropy: A Fully Probabilistic Framework for Uncertainty Quantification in Language Models [5.6672926445919165]
Large language models (LLMs) have transformed natural language processing, but their reliable deployment requires effective uncertainty quantification (UQ)<n>Existing UQ methods are often and lack a probabilistic foundation.<n>We propose a fully probabilistic framework based on an inverse model, which quantifies uncertainty by evaluating the diversity of the input space conditioned on a given output through systematic perturbations.
arXiv Detail & Related papers (2025-06-11T13:02:17Z) - Entropy-Based Block Pruning for Efficient Large Language Models [81.18339597023187]
We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance.<n> Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
arXiv Detail & Related papers (2025-04-04T03:42:34Z) - Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings [11.33157177182775]
Accurately quantifying uncertainty in large language models (LLMs) is crucial for their reliable deployment.
Current state-of-the-art methods for measuring semantic uncertainty in LLMs rely on strict bidirectional entailment criteria.
We propose a novel approach that leverages semantic embeddings to achieve smoother and more robust estimation of semantic uncertainty.
arXiv Detail & Related papers (2024-10-30T04:41:46Z) - Uncertainty Quantification in Large Language Models Through Convex Hull Analysis [0.36832029288386137]
This study proposes a novel geometric approach to uncertainty quantification using convex hull analysis.
The proposed method leverages the spatial properties of response embeddings to measure the dispersion and variability of model outputs.
arXiv Detail & Related papers (2024-06-28T07:47:34Z) - Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs [32.901839335074676]
Hallucinations present a major challenge to the practical adoption of Large Language Models.
Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations.
We propose SEPs, which directly approximate SE from the hidden states of a single generation.
arXiv Detail & Related papers (2024-06-22T19:46:06Z) - REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy [93.8400683020273]
Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity.
We propose REAL sampling, a decoding method that improved factuality and diversity over nucleus sampling.
arXiv Detail & Related papers (2024-06-11T21:44:49Z) - Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities [79.9629927171974]
Uncertainty in Large Language Models (LLMs) is crucial for applications where safety and reliability are important.
We propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs.
arXiv Detail & Related papers (2024-05-30T12:42:05Z) - Quantifying Semantic Emergence in Language Models [31.608080868988825]
Large language models (LLMs) are widely recognized for their exceptional capacity to capture semantics meaning.<n>In this work, we introduce a quantitative metric, Information Emergence (IE), designed to measure LLMs' ability to extract semantics from input tokens.
arXiv Detail & Related papers (2024-05-21T09:12:20Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.