Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
- URL: http://arxiv.org/abs/2502.19110v1
- Date: Wed, 26 Feb 2025 13:01:49 GMT
- Title: Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
- Authors: Zhengping Jiang, Anqi Liu, Benjamin Van Durme,
- Abstract summary: We propose a unified framework that connects abstention and linguistic calibration through the lens of linguistic pragmatics.<n>We describe an implementation that allows for controlling the level of imprecision in model responses.<n>Our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
- Score: 41.45862052156885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language model outputs are not always reliable; this prompts research into methods for adapting model responses based on uncertainty. Common approaches include: \emph{abstention}, where models refrain from generating responses when uncertain; and \emph{linguistic calibration}, where models hedge their statements using uncertainty quantifiers. However, abstention can withhold valuable information, while linguistically calibrated responses are often challenging to leverage in downstream tasks. We propose a unifying view of both approaches, Conformal Linguistic Calibration (CLC), reinterpreting linguistic calibration as answer set prediction. We begin by presenting a unified framework that connects abstention and linguistic calibration through the lens of linguistic pragmatics. We then describe an implementation that allows for controlling the level of imprecision in model responses. Experimental results show that our method produces calibrated outputs with conformal guarantees on factual accuracy. Furthermore, our approach enables fine-tuning models to perform uncertainty-aware adaptive claim rewriting, offering a controllable balance between factuality and specificity.
Related papers
- COPU: Conformal Prediction for Uncertainty Quantification in Natural Language Generation [14.461333001997449]
Uncertainty Quantification (UQ) for Natural Language Generation (NLG) is crucial for assessing the performance of Large Language Models (LLMs)<n>We propose ourmethod, a method that explicitly adds the ground truth to the candidate outputs and uses logit scores to measure nonconformity.
arXiv Detail & Related papers (2025-02-18T07:25:12Z) - The Reliability Paradox: Exploring How Shortcut Learning Undermines Language Model Calibration [5.616884466478886]
Pre-trained language models (PLMs) have enabled significant performance gains in the field of natural language processing.<n>Recent studies have found PLMs to suffer from miscalibration, indicating a lack of accuracy in the confidence estimates provided by these models.<n>This paper investigates whether lower calibration error implies reliable decision rules for a language model.
arXiv Detail & Related papers (2024-12-17T08:04:28Z) - Finetuning Language Models to Emit Linguistic Expressions of Uncertainty [5.591074369497796]
Large language models (LLMs) are increasingly employed in information-seeking and decision-making tasks.
LLMs tend to generate information that conflicts with real-world facts, and their persuasive style can make these inaccuracies appear confident and convincing.
In this work, we explore supervised finetuning on uncertainty-augmented predictions as a method to develop models that produce linguistic expressions of uncertainty.
arXiv Detail & Related papers (2024-09-18T17:52:53Z) - On Subjective Uncertainty Quantification and Calibration in Natural Language Generation [2.622066970118316]
Large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging.
This work addresses these challenges from a perspective of Bayesian decision theory.
We discuss how this assumption enables principled quantification of the model's subjective uncertainty and its calibration.
The proposed methods can be applied to black-box language models.
arXiv Detail & Related papers (2024-06-07T18:54:40Z) - Absolute convergence and error thresholds in non-active adaptive
sampling [0.27624021966289597]
Non-active adaptive sampling is a way of building machine learning models from a training data base.
Proposal for calculating absolute convergence and error thresholds is described.
Tests meet our expectations and illustrate the proposal in the domain of natural language processing.
arXiv Detail & Related papers (2024-02-04T15:10:34Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling [69.83976050879318]
In large language models (LLMs), identifying sources of uncertainty is an important step toward improving reliability, trustworthiness, and interpretability.
In this paper, we introduce an uncertainty decomposition framework for LLMs, called input clarification ensembling.
Our approach generates a set of clarifications for the input, feeds them into an LLM, and ensembles the corresponding predictions.
arXiv Detail & Related papers (2023-11-15T05:58:35Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Calibrating AI Models for Wireless Communications via Conformal
Prediction [55.47458839587949]
Conformal prediction is applied for the first time to the design of AI for communication systems.
This paper investigates the application of conformal prediction as a general framework to obtain AI models that produce decisions with formal calibration guarantees.
arXiv Detail & Related papers (2022-12-15T12:52:23Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.