Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering
- URL: http://arxiv.org/abs/2602.06022v1
- Date: Thu, 05 Feb 2026 18:55:56 GMT
- Title: Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering
- Authors: Miranda Muqing Miao, Young-Min Cho, Lyle Ungar,
- Abstract summary: We introduce CORAL, a regularized-time steering method that captures correctness signals from model internal activations using weight-decay probes.<n>CORAL consistently improves accuracy by 10% and expected calibration error (ECE) by 50% on average.<n>Our results support the hypothesis that distributed information in model internals can be extracted using regularized probes when individual neurons are insufficient.
- Score: 3.7758197704962835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) exhibit persistent miscalibration, especially after instruction tuning and preference alignment. Modified training objectives can improve calibration, but retraining is expensive. Inference-time steering offers a lightweight alternative, yet most existing methods optimize proxies for correctness rather than correctness itself. We introduce CORAL (Correctness-Optimized Residual Activation Lens), a regularized inference-time steering method that captures distributed correctness signals from model internal activations using weight-decay MLP probes. We evaluate CORAL across three 7B-parameter models and find that it consistently improves accuracy by 10\% and expected calibration error (ECE) by 50\% on average. We additionally demonstrate that these gains transfer without retraining to the complete published test sets of four held-out benchmarks (ARC-Challenge, HellaSwag, Math-MC, OpenBookQA), averaging 14\% accuracy improvements and 49\% ECE improvements. Our results support the hypothesis that distributed information in model internals can be extracted using regularized probes when individual neurons are insufficient. CORAL thus provides a compute-efficient, transferable, and calibration-aware approach to improve MCQA performance during inference.
Related papers
- CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty [59.97939500426759]
This paper describes RLCR, an approach to training reasoning models that jointly improves accuracy and confidence estimation.<n>We show that across diverse datasets, RLCR substantially improves calibration with no loss in accuracy.<n>We also demonstrate that verbalized confidence can be leveraged at test time to improve accuracy and calibration.
arXiv Detail & Related papers (2025-07-22T17:56:01Z) - Fill In The Gaps: Model Calibration and Generalization with Synthetic Data [2.89287673224661]
We propose a calibration method that incorporates synthetic data without compromising accuracy.
We derive the expected calibration error (ECE) bound using the Probably Approximately Correct (PAC) learning framework.
We observed an average up to 34% increase in accuracy and 33% decrease in ECE.
arXiv Detail & Related papers (2024-10-07T23:06:42Z) - Calibrating Language Models with Adaptive Temperature Scaling [58.056023173579625]
We introduce Adaptive Temperature Scaling (ATS), a post-hoc calibration method that predicts a temperature scaling parameter for each token prediction.
ATS improves calibration by over 10-50% across three downstream natural language evaluation benchmarks compared to prior calibration methods.
arXiv Detail & Related papers (2024-09-29T22:54:31Z) - Optimizing Class-Level Probability Reweighting Coefficients for Equitable Prompting Accuracy [12.287692969438169]
LLMs often uncover biases from pre-training data's statistical regularities.<n>This leads to persistent, uneven class accuracy in classification and QA.<n>We develop a post-hoc probability reweighting method that directly optimize for non-differentiable performance-driven metrics.
arXiv Detail & Related papers (2024-05-13T10:30:33Z) - Towards Unbiased Calibration using Meta-Regularization [6.440598446802981]
We propose to learn better-calibrated models via meta-regularization, which has two components.
We evaluate the effectiveness of the proposed approach in regularizing neural networks towards improved and unbiased calibration on three computer vision datasets.
arXiv Detail & Related papers (2023-03-27T10:00:50Z) - AdaFocal: Calibration-aware Adaptive Focal Loss [8.998525155518836]
Training with focal loss leads to better calibration than cross-entropy.
We propose a calibration-aware adaptive focal loss called AdaFocal.
arXiv Detail & Related papers (2022-11-21T20:19:24Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation [10.209143402485406]
This paper argues that calibration is important in practice and is easy to maintain.<n>We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance.
arXiv Detail & Related papers (2021-12-14T06:19:05Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.