Towards Reliable Test-Time Adaptation: Style Invariance as a Correctness Likelihood
- URL: http://arxiv.org/abs/2512.07390v1
- Date: Mon, 08 Dec 2025 10:23:44 GMT
- Title: Towards Reliable Test-Time Adaptation: Style Invariance as a Correctness Likelihood
- Authors: Gilhyun Nam, Taewon Kim, Joonhyun Jeong, Eunho Yang,
- Abstract summary: Test-time adaptation (TTA) enables efficient adaptation of deployed models.<n>Traditional calibration methods assume fixed models or static distributions.<n>We introduce Style Invariance as a Correctness Likelihood (SICL)<n>SICL estimates instance-wise correctness likelihood by measuring prediction consistency across style-altered variants.
- Score: 39.549479855380184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) enables efficient adaptation of deployed models, yet it often leads to poorly calibrated predictive uncertainty - a critical issue in high-stakes domains such as autonomous driving, finance, and healthcare. Existing calibration methods typically assume fixed models or static distributions, resulting in degraded performance under real-world, dynamic test conditions. To address these challenges, we introduce Style Invariance as a Correctness Likelihood (SICL), a framework that leverages style-invariance for robust uncertainty estimation. SICL estimates instance-wise correctness likelihood by measuring prediction consistency across style-altered variants, requiring only the model's forward pass. This makes it a plug-and-play, backpropagation-free calibration module compatible with any TTA method. Comprehensive evaluations across four baselines, five TTA methods, and two realistic scenarios with three model architecture demonstrate that SICL reduces calibration error by an average of 13 percentage points compared to conventional calibration approaches.
Related papers
- Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration [34.52946891778497]
Deep neural networks (DNNs) have demonstrated state-of-the-art performance across various domains.<n>They often face calibration issues, particularly in safety-critical applications such as autonomous driving and healthcare.<n>Recent research has started to improve model calibration from the view of the classifier.
arXiv Detail & Related papers (2025-04-14T09:09:01Z) - Enhancing accuracy of uncertainty estimation in appearance-based gaze tracking with probabilistic evaluation and calibration [13.564919425738163]
Uncertainty in appearance-based gaze tracking is critical for ensuring reliable downstream applications.<n>Current uncertainty-aware approaches adopt probabilistic models to acquire uncertainties by following distributions in the training dataset.<n>We propose a correction strategy based on probability calibration to mitigate biases in the estimated uncertainties of the trained models.
arXiv Detail & Related papers (2025-01-24T19:33:55Z) - COME: Test-time adaption by Conservatively Minimizing Entropy [45.689829178140634]
Conservatively Minimize the Entropy (COME) is a drop-in replacement of traditional entropy (EM)
COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions.
We show that COME achieves state-of-the-art performance on commonly used benchmarks.
arXiv Detail & Related papers (2024-10-12T09:20:06Z) - Robust Calibration of Large Vision-Language Adapters [17.583536041845402]
This paper addresses the critical issue of miscalibration in CLIP-based model adaptation.
We empirically demonstrate that popular CLIP adaptation approaches, such as Adapters, Prompt Learning, and Test-Time Adaptation, substantially degrade the calibration capabilities of the zero-shot baseline.
Motivated by these observations, we present a simple and model-agnostic solution to mitigate miscalibration, by scaling the logit range of each sample to its zero-shot prediction logits.
arXiv Detail & Related papers (2024-07-18T15:27:56Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration [0.6906005491572401]
We show that Information Bottleneck-based IRM achieves consistent calibration across different environments.
Our empirical evidence indicates that models exhibiting consistent calibration across environments are also well-calibrated.
arXiv Detail & Related papers (2024-01-31T02:08:43Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.