Related papers: Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

URL: http://arxiv.org/abs/2406.13230v2
Date: Tue, 12 Nov 2024 16:47:49 GMT
Title: Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding
Authors: Xin Liu, Farima Fatahi Bayat, Lu Wang,
Abstract summary: Calibrating language models (LMs) align their generation confidence with the actual likelihood of answer correctness. We propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations. We also propose CoDec, a confidence-guided decoding strategy, to elicit truthful answers with high confidence from LMs.
Score: 7.855485779946983
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover, simply filtering out low-confidence responses reduces the LM's helpfulness when the answers are correct. Therefore, effectively using calibration techniques to enhance an LM's factuality remains an unsolved challenge. In this paper, we first propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations that can better capture the representations of knowledge. Built on top of ActCab, we further propose CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs. By evaluating on five popular QA benchmarks, ActCab achieves superior calibration performance than all competitive baselines, e.g., by reducing the average expected calibration error (ECE) score by up to 39%. Further experiments on CoDec show consistent improvements in several LMs' factuality on challenging QA datasets, such as TruthfulQA, highlighting the value of confidence signals in enhancing factuality.

Related papers

On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z)
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs [57.10533368622962]
Self-correction of large language models (LLMs) emerges as a critical component for enhancing their reasoning performance.<n>This study introduces CorrectBench, a benchmark developed to evaluate the effectiveness of self-correction strategies.<n>Our findings reveal that: 1) Self-correction methods can improve accuracy, especially for complex reasoning tasks; 2) Mixing different self-correction strategies yields further improvements, though it reduces efficiency; and 3) Reasoning LLMs (e.g., DeepSeek-R1) have limited optimization under additional self-correction methods and have high time costs.
arXiv Detail & Related papers (2025-10-17T02:40:19Z)
Confidence-Based Response Abstinence: Improving LLM Trustworthiness via Activation-Based Uncertainty Estimation [7.3923284353934875]
We propose a method for confidence estimation in retrieval-augmented generation (RAG) systems that aligns closely with the correctness of large language model (LLM) outputs.<n>Our approach extends prior uncertainty quantification methods by leveraging raw feed-forward network (FFN) activations as auto-regressive signals.<n>Our results demonstrate that activation-based confidence modeling offers a scalable, architecture-aware path toward trustworthy RAG deployment.
arXiv Detail & Related papers (2025-10-15T16:55:56Z)
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering [64.50919789875233]
We show that correctness of a solution is encoded as a geometrically separable signature within the trajectory of hidden activations.<n>ClUE consistently outperforms LLM-as-a-judge baselines and matches or exceeds modern confidence-based methods in reranking candidates.
arXiv Detail & Related papers (2025-10-02T02:14:33Z)
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty [59.97939500426759]
This paper describes RLCR, an approach to training reasoning models that jointly improves accuracy and confidence estimation.<n>We show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy.<n>We also demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration.
arXiv Detail & Related papers (2025-07-22T17:56:01Z)
SGIC: A Self-Guided Iterative Calibration Framework for RAG [45.17496149653415]
Large language models (LLMs) capitalize on their robust in-context reasoning.<n>We present a new framework that employs uncertainty scores as a tool.<n>We also introduce an innovative approach for constructing an iterative self-calibration training set.
arXiv Detail & Related papers (2025-06-19T09:45:13Z)
MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs [35.6424858476337]
We introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition.<n>We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness.
arXiv Detail & Related papers (2025-05-30T17:54:08Z)
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration? [19.38577744626441]
Large Language Models (LLMs) often demonstrate poor calibration with confidence scores misaligned with actual performance.<n>Our research reveals that LLMs' prior knowledge causes potential poor calibration due to the ubiquitous presence of known data in real-world fine-tuning.<n>We propose CogCalib, a cognition-aware framework that applies targeted learning strategies according to the model's prior knowledge.
arXiv Detail & Related papers (2025-05-27T08:51:31Z)
Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception [58.62352010928591]
Large language models (LLMs) exhibit impressive performance across diverse tasks but often struggle to accurately gauge their knowledge boundaries. This paper explores leveraging LLMs' internal states to enhance their perception of knowledge boundaries from efficiency and risk perspectives.
arXiv Detail & Related papers (2025-02-17T11:11:09Z)
Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level. We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z)
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token. We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains. Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z)
Graph-based Confidence Calibration for Large Language Models [22.394717844099684]
We propose a novel method to develop a well-calibrated confidence estimation model. We use a weighted graph to represent the consistency among the large language models' responses to a question. We then train a graph neural network to estimate the probability of correct responses.
arXiv Detail & Related papers (2024-11-03T20:36:44Z)
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification [52.095460362197336]
Large language models (LLMs) struggle with consistent and accurate reasoning. LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors. We propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification.
arXiv Detail & Related papers (2024-10-05T05:21:48Z)
Enhancing Healthcare LLM Trust with Atypical Presentations Recalibration [20.049443396032423]
Black-box large language models (LLMs) are increasingly deployed in various environments. LLMs often exhibit overconfidence, leading to potential risks and misjudgments. We propose a novel method, textitAtypical presentations Recalibration, which leverages atypical presentations to adjust the model's confidence estimates.
arXiv Detail & Related papers (2024-09-05T03:45:35Z)
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation [18.815226646364476]
Existing calibration methods for large language models (LLMs) focus on estimating or eliciting individual confidence without taking full advantage of the "Collective Wisdom" We propose Collaborative, a post-hoc training-free calibration strategy that leverages the collaborative and expressive capabilities of multiple tool-augmented LLM agents in a simulated group deliberation process.
arXiv Detail & Related papers (2024-04-14T02:40:43Z)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%. FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z)
Calibrating Long-form Generations from Large Language Models [34.72041258464477]
Large Language Models' (LLMs) confidence scores should align with the actual likelihood of its responses being correct. Current confidence elicitation methods and calibration metrics rely on a binary true/false assessment of response correctness. We introduce a unified calibration framework, in which both the correctness of the LLMs' responses and their associated confidence levels are treated as distributions across a range of scores.
arXiv Detail & Related papers (2024-02-09T17:00:32Z)
Self-Evaluation Improves Selective Generation in Large Language Models [54.003992911447696]
We reformulate open-ended generation tasks into token-level prediction tasks. We instruct an LLM to self-evaluate its answers. We benchmark a range of scoring methods based on self-evaluation.
arXiv Detail & Related papers (2023-12-14T19:09:22Z)
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback [91.22679548111127]
A trustworthy real-world prediction system should produce well-calibrated confidence scores. We show that verbalized confidences emitted as output tokens are typically better-calibrated than the model's conditional probabilities.
arXiv Detail & Related papers (2023-05-24T10:12:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.