Related papers: Calibrating LLM Confidence by Probing Perturbed Representation Stability

Calibrating LLM Confidence by Probing Perturbed Representation Stability

URL: http://arxiv.org/abs/2505.21772v1
Date: Tue, 27 May 2025 21:14:04 GMT
Title: Calibrating LLM Confidence by Probing Perturbed Representation Stability
Authors: Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese H. Smiley, Kundan Thind, Mohammad M. Ghassemi,
Abstract summary: Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation.<n>We introduce CCPS, a novel method analyzing internal representational stability in LLMs.<n>We show that CCPS reduces Expected Error by approximately 55% and Brier-Pro benchmarks by 21%, while increasing accuracy by 5 percentage points.
Score: 2.2289267617545616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Miscalibration in Large Language Models (LLMs) undermines their reliability, highlighting the need for accurate confidence estimation. We introduce CCPS (Calibrating LLM Confidence by Probing Perturbed Representation Stability), a novel method analyzing internal representational stability in LLMs. CCPS applies targeted adversarial perturbations to final hidden states, extracts features reflecting the model's response to these perturbations, and uses a lightweight classifier to predict answer correctness. CCPS was evaluated on LLMs from 8B to 32B parameters (covering Llama, Qwen, and Mistral architectures) using MMLU and MMLU-Pro benchmarks in both multiple-choice and open-ended formats. Our results show that CCPS significantly outperforms current approaches. Across four LLMs and three MMLU variants, CCPS reduces Expected Calibration Error by approximately 55% and Brier score by 21%, while increasing accuracy by 5 percentage points, Area Under the Precision-Recall Curve by 4 percentage points, and Area Under the Receiver Operating Characteristic Curve by 6 percentage points, all relative to the strongest prior method. CCPS delivers an efficient, broadly applicable, and more accurate solution for estimating LLM confidence, thereby improving their trustworthiness.

Related papers

SGIC: A Self-Guided Iterative Calibration Framework for RAG [45.17496149653415]
Large language models (LLMs) capitalize on their robust in-context reasoning.<n>We present a new framework that employs uncertainty scores as a tool.<n>We also introduce an innovative approach for constructing an iterative self-calibration training set.
arXiv Detail & Related papers (2025-06-19T09:45:13Z)
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception [58.62352010928591]
Large language models (LLMs) exhibit impressive performance across diverse tasks but often struggle to accurately gauge their knowledge boundaries.<n>This paper explores leveraging LLMs' internal states to enhance their perception of knowledge boundaries from efficiency and risk perspectives.
arXiv Detail & Related papers (2025-02-17T11:11:09Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
Rational Tuning of LLM Cascades via Probabilistic Modeling [0.9208007322096532]
We present a probabilistic model for the joint performance distribution of a sequence of large language models (LLMs)<n>Compared to selecting confidence thresholds using Bayesian optimization, our Markov parametric-copula model yields more favorable error-cost trade-offs.<n>Our framework's inductive assumptions about the interactions between the error rates of different LLMs enhance sample efficiency.
arXiv Detail & Related papers (2025-01-16T07:58:33Z)
Influences on LLM Calibration: A Study of Response Agreement, Loss Functions, and Prompt Styles [4.477423478591491]
Calib-n is a novel framework that trains an auxiliary model for confidence estimation.<n>We find that few-shot prompts are the most effective for auxiliary model-based methods.
arXiv Detail & Related papers (2025-01-07T18:48:42Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs [21.94487480599671]
Calibrated Fine-Tuning (UQ4CT) captures and calibrates uncertainty over the space of functions that map input prompts to outputs.<n>We implement UQ4CT during the fine-tuning stage via a mixture-of-experts framework that hierarchically decomposes the functional space.<n>Even under distribution shift, UQ4CT maintains superior ECE performance with high accuracy, showcasing improved generalizability.
arXiv Detail & Related papers (2024-10-09T00:09:15Z)
Confidence Estimation for LLM-Based Dialogue State Tracking [9.305763502526833]
Estimation of a model's confidence on its outputs is critical for Conversational AI systems based on large language models (LLMs) We provide an exhaustive exploration of methods, including approaches proposed for open- and closed-weight LLMs. Our findings suggest that fine-tuning open-weight LLMs can result in enhanced AUC performance, indicating better confidence score calibration.
arXiv Detail & Related papers (2024-09-15T06:44:26Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models [63.118592279833656]
Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs)<n>We propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise.<n> Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths.
arXiv Detail & Related papers (2024-05-23T16:21:48Z)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%. FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z)
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs [60.61002524947733]
Previous confidence elicitation methods rely on white-box access to internal model information or model fine-tuning. This leads to a growing need to explore the untapped area of black-box approaches for uncertainty estimation. We define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency.
arXiv Detail & Related papers (2023-06-22T17:31:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.