CATTO: Balancing Preferences and Confidence in Language Models
- URL: http://arxiv.org/abs/2601.23096v2
- Date: Mon, 02 Feb 2026 15:45:18 GMT
- Title: CATTO: Balancing Preferences and Confidence in Language Models
- Authors: Nisarg Parikh, Ananya Sai, Pannaga Shivaswamy, Kunjal Panchal, Andrew Lan,
- Abstract summary: Large language models (LLMs) often make accurate next token predictions but their confidence in these predictions can be poorly calibrated.<n>We introduce a predictive calibration-aware objective that aligns predicted confidence with empirical prediction correctness.<n>We also introduce Confidence@k, a test-time scaling mechanism leveraging calibrated token probabilities for Bayes-optimal selection of output tokens.
- Score: 4.678970068275123
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) often make accurate next token predictions but their confidence in these predictions can be poorly calibrated: high-confidence predictions are frequently wrong, and low-confidence predictions may be correct. This miscalibration is exacerbated by preference-based alignment methods breaking the link between predictive probability and correctness. We introduce a Calibration Aware Token-level Training Objective (CATTO), a calibration-aware objective that aligns predicted confidence with empirical prediction correctness, which can be combined with the original preference optimization objectives. Empirically, CATTO reduces Expected Calibration Error (ECE) by 2.22%-7.61% in-distribution and 1.46%-10.44% out-of-distribution compared to direct preference optimization (DPO), and by 0.22%-1.24% in-distribution and 1.23%-5.07% out-of-distribution compared to the strongest DPO baseline. This improvement in confidence does not come at a cost of losing task accuracy, where CATTO maintains or slightly improves multiple-choice question-answering accuracy on five datasets. We also introduce Confidence@k, a test-time scaling mechanism leveraging calibrated token probabilities for Bayes-optimal selection of output tokens.
Related papers
- UAT-LITE: Inference-Time Uncertainty-Aware Attention for Pretrained Transformers [11.741258610945259]
UAT-LITE is an inference-time framework that makes self-attention uncertainty-aware.<n>It reduces Expected Error by approximately 20% on average relative to a fine-tuned BERT-base baseline.<n>It improves selective prediction and robustness under distribution shift.
arXiv Detail & Related papers (2026-02-03T00:51:26Z) - Uncertainty-Aware Post-Hoc Calibration: Mitigating Confidently Incorrect Predictions Beyond Calibration Metrics [6.9681910774977815]
This paper presents a post-hoc calibration framework to enhance calibration quality and uncertainty-aware decision-making.<n>A comprehensive evaluation is conducted using calibration metrics, uncertainty-aware performance measures, and empirical conformal coverage.<n> Experiments show that the proposed method achieves lower confidently incorrect predictions, and competitive Expected Error compared with isotonic and focal-loss baselines.
arXiv Detail & Related papers (2025-10-19T23:55:36Z) - On the calibration of Just-in-time Defect Prediction [0.0]
We evaluate the calibration of three JIT DP techniques to determine whether and to what extent they exhibit poor calibration.<n>Results reveal that all evaluated JIT DP models exhibit some level of miscalibration, with ECE ranging from 2-35%.
arXiv Detail & Related papers (2025-04-16T13:06:20Z) - Provably Reliable Conformal Prediction Sets in the Presence of Data Poisoning [53.42244686183879]
Conformal prediction provides model-agnostic and distribution-free uncertainty quantification.<n>Yet, conformal prediction is not reliable under poisoning attacks where adversaries manipulate both training and calibration data.<n>We propose reliable prediction sets (RPS): the first efficient method for constructing conformal prediction sets with provable reliability guarantees under poisoning.
arXiv Detail & Related papers (2024-10-13T15:37:11Z) - Does confidence calibration improve conformal prediction? [10.340903334800787]
We show that current confidence calibration methods lead to larger prediction sets in adaptive conformal prediction.<n>By investigating the role of temperature value, we observe that high-confidence predictions can enhance the efficiency of adaptive conformal prediction.<n>We propose Conformal Temperature Scaling (ConfTS), a variant of temperature scaling with a novel loss function designed to enhance the efficiency of prediction sets.
arXiv Detail & Related papers (2024-02-06T19:27:48Z) - Beyond calibration: estimating the grouping loss of modern neural
networks [68.8204255655161]
Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss.
We show that modern neural network architectures in vision and NLP exhibit grouping loss, notably in distribution shifts settings.
arXiv Detail & Related papers (2022-10-28T07:04:20Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via
Higher-Order Influence Functions [121.10450359856242]
We develop a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals.
The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy.
arXiv Detail & Related papers (2020-06-29T13:36:52Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Individual Calibration with Randomized Forecasting [116.2086707626651]
We show that calibration for individual samples is possible in the regression setup if the predictions are randomized.
We design a training objective to enforce individual calibration and use it to train randomized regression functions.
arXiv Detail & Related papers (2020-06-18T05:53:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.