Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
- URL: http://arxiv.org/abs/2502.19110v4
- Date: Mon, 11 Aug 2025 20:27:28 GMT
- Title: Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
- Authors: Zhengping Jiang, Anqi Liu, Benjamin Van Durme,
- Abstract summary: We present a framework connecting abstention and linguistic calibration through the lens of linguistic pragmatics.<n>Results demonstrate our method produces calibrated outputs with conformal guarantees on factual accuracy.
- Score: 41.45862052156885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language model outputs are not always reliable, thus prompting research into how to adapt model responses based on uncertainty. Common approaches include: \emph{abstention}, where models refrain from generating responses when uncertain; and \emph{linguistic calibration}, where models hedge their statements using uncertainty quantifiers. However, abstention can withhold valuable information, while linguistically calibrated responses are often challenging to leverage in downstream tasks. We propose a unified view, Conformal Linguistic Calibration (CLC), which reinterprets linguistic calibration as \emph{answer set prediction}. First we present a framework connecting abstention and linguistic calibration through the lens of linguistic pragmatics. We then describe an implementation of CLC that allows for controlling the level of imprecision in model responses. Results demonstrate our method produces calibrated outputs with conformal guarantees on factual accuracy. Further, our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
Related papers
- On Calibration of Large Language Models: From Response To Capability [66.59139960234326]
Large language models (LLMs) are widely deployed as general-purpose problem solvers.<n>We introduce capability calibration, which targets the model's expected accuracy on a query.<n>Our results demonstrate that capability-calibrated confidence improves pass@$k$ prediction and inference budget allocation.
arXiv Detail & Related papers (2026-02-14T01:07:45Z) - Explicit Uncertainty Modeling for Active CLIP Adaptation with Dual Prompt Tuning [51.99383151474742]
We propose a robust uncertainty modeling framework for active CLIP adaptation based on dual-prompt tuning.<n>We show that our method consistently outperforms existing active learning methods under the same annotation budget.
arXiv Detail & Related papers (2026-02-04T09:01:55Z) - Corrective Diffusion Language Models [12.724100711773593]
We study corrective behavior in diffusion language models, defined as the ability to assign lower confidence to incorrect tokens and iteratively refine them while preserving correct content.<n>We propose a correction-oriented post-training principle that explicitly supervises visible incorrect tokens, enabling error-aware confidence and targeted refinement.
arXiv Detail & Related papers (2025-12-17T17:04:38Z) - Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal [31.458406135473805]
We present UniCR, a unified framework that turns heterogeneous uncertainty evidence into a calibrated probability of correctness.<n>UniCR learns a lightweight calibration head with temperature scaling and proper scoring.<n>Experiments on short-form QA, code generation with execution tests, and retrieval-augmented long-form QA show consistent improvements in calibration metrics.
arXiv Detail & Related papers (2025-09-01T13:14:58Z) - COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation [14.461333001997449]
Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs)<n>We propose ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity.
arXiv Detail & Related papers (2025-02-18T07:25:12Z) - The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration [5.616884466478886]
Pre-trained language models (PLMs) have enabled significant performance gains in the field of natural language processing.<n>Recent studies have found PLMs to suffer from miscalibration, indicating a lack of accuracy in the confidence estimates provided by these models.<n>This paper investigates whether lower calibration error implies reliable decision rules for a language model.
arXiv Detail & Related papers (2024-12-17T08:04:28Z) - Enhancing Trust in Large Language Models with Uncertainty-Aware Fine-Tuning [10.457661605916435]
Large language models (LLMs) have revolutionized the field of natural language processing with their impressive reasoning and question-answering capabilities.<n>LLMs are sometimes prone to generating credible-sounding but incorrect information, a phenomenon known as hallucinations.<n>We introduce a novel uncertainty-aware causal language modeling loss function, grounded in the principles of decision theory.
arXiv Detail & Related papers (2024-12-03T23:14:47Z) - Finetuning Language Models to Emit Linguistic Expressions of Uncertainty [5.591074369497796]
Large language models (LLMs) are increasingly employed in information-seeking and decision-making tasks.
LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing.
In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty.
arXiv Detail & Related papers (2024-09-18T17:52:53Z) - On Subjective Uncertainty Quantification and Calibration in Natural Language Generation [2.622066970118316]
Large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging.
This work addresses these challenges from a perspective of Bayesian decision theory.
We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration.
The proposed methods can be applied to black-box language models.
arXiv Detail & Related papers (2024-06-07T18:54:40Z) - Absolute convergence and error thresholds in non-active adaptive
sampling [0.27624021966289597]
Non-active adaptive sampling is a way of building machine learning models from a training data base.
Proposal for calculating absolute convergence and error thresholds is described.
Tests meet our expectations and illustrate the proposal in the domain of natural language processing.
arXiv Detail & Related papers (2024-02-04T15:10:34Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Calibrating AI Models for Wireless Communications via Conformal
Prediction [55.47458839587949]
Conformal prediction is applied for the first time to the design of AI for communication systems.
This paper investigates the application of conformal prediction as a general framework to obtain AI models that produce decisions with formal calibration guarantees.
arXiv Detail & Related papers (2022-12-15T12:52:23Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.