Related papers: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence

Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence

URL: http://arxiv.org/abs/2405.12412v3
Date: Wed, 22 Oct 2025 21:36:17 GMT
Title: Assessing the Probabilistic Fit of Neural Regressors via Conditional Congruence
Authors: Spencer Young, Riley Sinema, Cole Edgren, Andrew Hall, Nathan Dong, Porter Jenkins,
Abstract summary: Existing approaches for measuring this misalignment are primarily developed under the framework of calibration.<n>We introduce a metric, Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance, at any point, between the learned predictive distribution and the empirical, conditional distribution in a dataset.<n>We perform several high dimensional regression tasks and show that CCE exhibits four critical properties: $textitcorrectness$, $textitmonotonicity$, $textitreliability$, and $textitrobustness$.
Score: 2.13382635602206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While significant progress has been made in specifying neural networks capable of representing uncertainty, deep networks still often suffer from overconfidence and misaligned predictive distributions. Existing approaches for measuring this misalignment are primarily developed under the framework of calibration, with common metrics such as Expected Calibration Error (ECE). However, calibration can only provide a strictly marginal assessment of probabilistic alignment. Consequently, calibration metrics such as ECE are $\textit{distribution-wise}$ measures and cannot diagnose the $\textit{point-wise}$ reliability of individual inputs, which is important for real-world decision-making. We propose a stronger condition, which we term $\textit{conditional congruence}$, for assessing probabilistic fit. We also introduce a metric, Conditional Congruence Error (CCE), that uses conditional kernel mean embeddings to estimate the distance, at any point, between the learned predictive distribution and the empirical, conditional distribution in a dataset. We perform several high dimensional regression tasks and show that CCE exhibits four critical properties: $\textit{correctness}$, $\textit{monotonicity}$, $\textit{reliability}$, and $\textit{robustness}$.

Related papers

Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification [0.0]
This work bridges information geometry and statistical learning, offering formal guarantees for uncertainty-aware classification in applications requiring rigorous validation.<n> Empirical validation on Adeno-Associated Virus classification demonstrates that the two-stage framework captures 72.5% of errors while deferring 34.5% of samples, reducing automated decision error rates from 16.8% to 6.9%.
arXiv Detail & Related papers (2025-11-26T01:29:49Z)
CCE: Confidence-Consistency Evaluation for Time Series Anomaly Detection [56.302586730134806]
We introduce Confidence-Consistency Evaluation (CCE), a novel evaluation metric.<n>CCE simultaneously measures prediction confidence and uncertainty consistency.<n>We also establish RankEval, a benchmark for comparing the ranking capabilities of various metrics.
arXiv Detail & Related papers (2025-09-01T03:38:38Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
When Can We Reuse a Calibration Set for Multiple Conformal Predictions? [0.0]
We show how e-conformal prediction, in conjunction with Hoeffding's inequality, can enable the repeated use of a single calibration set.<n>We train a deep neural network and utilise a calibration set to estimate a Hoeffding correction.<n>This correction allows us to apply a modified Markov's inequality, leading to the construction of prediction sets with quantifiable confidence.
arXiv Detail & Related papers (2025-06-24T14:57:25Z)
Pointwise confidence estimation in the non-linear $\ell^2$-regularized least squares [12.352761060862072]
We consider a high-probability non-asymptotic confidence estimation in the $ell2$-regularized non-linear least-squares setting with fixed design.<n>We show a pointwise confidence bound, meaning that it holds for the prediction on any given fixed test input $x$.
arXiv Detail & Related papers (2025-06-08T11:23:49Z)
SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU) We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level. Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
Probabilistic Scores of Classifiers, Calibration is not Enough [0.32985979395737786]
In binary classification tasks, accurate representation of probabilistic predictions is essential for various real-world applications. In this study, we highlight approaches that prioritize the alignment between predicted scores and true probability distributions. Our findings reveal limitations in traditional calibration metrics, which could undermine the reliability of predictive models for critical decision-making.
arXiv Detail & Related papers (2024-08-06T19:53:00Z)
Decoupling of neural network calibration measures [45.70855737027571]
We investigate the coupling of different neural network calibration measures with a special focus on the Area Under Sparsification Error curve (AUSE) metric. We conclude that the current methodologies leave a degree of freedom, which prevents a unique model for the homologation of safety-critical functionalities.
arXiv Detail & Related papers (2024-06-04T15:21:37Z)
Robust Conformal Prediction under Distribution Shift via Physics-Informed Structural Causal Model [24.58531056536442]
Conformal prediction (CP) handles uncertainty by predicting a set on a test input. This coverage can be guaranteed on test data even if the marginal distributions $P_X$ differ between calibration and test datasets. We propose a physics-informed structural causal model (PI-SCM) to reduce the upper bound.
arXiv Detail & Related papers (2024-03-22T08:13:33Z)
Calibration by Distribution Matching: Trainable Kernel Calibration Metrics [56.629245030893685]
We introduce kernel-based calibration metrics that unify and generalize popular forms of calibration for both classification and regression. These metrics admit differentiable sample estimates, making it easy to incorporate a calibration objective into empirical risk minimization. We provide intuitive mechanisms to tailor calibration metrics to a decision task, and enforce accurate loss estimation and no regret decisions.
arXiv Detail & Related papers (2023-10-31T06:19:40Z)
Calibrating Neural Simulation-Based Inference with Differentiable Coverage Probability [50.44439018155837]
We propose to include a calibration term directly into the training objective of the neural model. By introducing a relaxation of the classical formulation of calibration error we enable end-to-end backpropagation. It is directly applicable to existing computational pipelines allowing reliable black-box posterior inference.
arXiv Detail & Related papers (2023-10-20T10:20:45Z)
Calibration-Aware Bayesian Learning [37.82259435084825]
This paper proposes an integrated framework, referred to as calibration-aware Bayesian neural networks (CA-BNNs) It applies both data-dependent or data-independent regularizers while optimizing over a variational distribution as in Bayesian learning. Numerical results validate the advantages of the proposed approach in terms of expected calibration error (ECE) and reliability diagrams.
arXiv Detail & Related papers (2023-05-12T14:19:15Z)
A Consistent and Differentiable Lp Canonical Calibration Error Estimator [21.67616079217758]
Deep neural networks are poorly calibrated and tend to output overconfident predictions. We propose a low-bias, trainable calibration error estimator based on Dirichlet kernel density estimates. Our method has a natural choice of kernel, and can be used to generate consistent estimates of other quantities.
arXiv Detail & Related papers (2022-10-13T15:11:11Z)
Evaluating Aleatoric Uncertainty via Conditional Generative Models [15.494774321257939]
We study conditional generative models for aleatoric uncertainty estimation. We introduce two metrics to measure the discrepancy between two conditional distributions. We demonstrate numerically how our metrics provide correct measurements of conditional distributional discrepancies.
arXiv Detail & Related papers (2022-06-09T05:39:04Z)
Bayesian Confidence Calibration for Epistemic Uncertainty Modelling [4.358626952482686]
We introduce a framework to obtain confidence estimates in conjunction with an uncertainty of the calibration method. We achieve state-of-the-art calibration performance for object detection calibration.
arXiv Detail & Related papers (2021-09-21T10:53:16Z)
Distribution-free uncertainty quantification for classification under label shift [105.27463615756733]
We focus on uncertainty quantification (UQ) for classification problems via two avenues. We first argue that label shift hurts UQ, by showing degradation in coverage and calibration. We examine these techniques theoretically in a distribution-free framework and demonstrate their excellent practical performance.
arXiv Detail & Related papers (2021-03-04T20:51:03Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)
Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions. We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test. Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z)
Distribution-free binary classification: prediction sets, confidence intervals and calibration [106.50279469344937]
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting. We derive confidence intervals for binned probabilities for both fixed-width and uniform-mass binning. As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration.
arXiv Detail & Related papers (2020-06-18T14:17:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.