Related papers: Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification

URL: http://arxiv.org/abs/2602.04775v1
Date: Wed, 04 Feb 2026 17:12:04 GMT
Title: Interval-Based AUC (iAUC): Extending ROC Analysis to Uncertainty-Aware Classification
Authors: Yuqi Li, Matthew M. Engelhard,
Abstract summary: We propose an uncertainty-aware ROC framework specifically for interval-valued predictions.<n>We introduce two new measures: $AUC_L$ and $AUC_U$.<n>We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC.
Score: 12.024101882027466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In high-stakes risk prediction, quantifying uncertainty through interval-valued predictions is essential for reliable decision-making. However, standard evaluation tools like the receiver operating characteristic (ROC) curve and the area under the curve (AUC) are designed for point scores and fail to capture the impact of predictive uncertainty on ranking performance. We propose an uncertainty-aware ROC framework specifically for interval-valued predictions, introducing two new measures: $AUC_L$ and $AUC_U$. This framework enables an informative three-region decomposition of the ROC plane, partitioning pairwise rankings into correct, incorrect, and uncertain orderings. This approach naturally supports selective prediction by allowing models to abstain from ranking cases with overlapping intervals, thereby optimizing the trade-off between abstention rate and discriminative reliability. We prove that under valid class-conditional coverage, $AUC_L$ and $AUC_U$ provide formal lower and upper bounds on the theoretical optimal AUC ($AUC^*$), characterizing the physical limit of achievable discrimination. The proposed framework applies broadly to interval-valued prediction models, regardless of the interval construction method. Experiments on real-world benchmark datasets, using bootstrap-based intervals as one instantiation, validate the framework's correctness and demonstrate its practical utility for uncertainty-aware evaluation and decision-making.

Related papers

Towards Anytime-Valid Statistical Watermarking [63.02116925616554]
We develop the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference.<n>Our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-19T18:32:26Z)
LEC: Linear Expectation Constraints for False-Discovery Control in Selective Prediction and Routing Systems [95.35293543918762]
Large language models (LLMs) often generate unreliable answers, while uncertainty methods fail to fully distinguish correct from incorrect predictions.<n>We address this issue through the lens of false discovery rate (FDR) control, ensuring that among all accepted predictions, the proportion of errors does not exceed a target risk level.<n>We propose LEC, which reinterprets selective prediction as a constrained decision problem by enforcing a Linear Expectation Constraint.
arXiv Detail & Related papers (2025-12-01T11:27:09Z)
Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees [24.48143253497661]
textscUnKGCP generates prediction intervals guaranteed to contain the true score with a user-specified level of confidence.<n>We provide theoretical guarantees for the intervals and empirically verify these guarantees.<n>Experiments on standard benchmarks across diverse UnKGE methods further demonstrate that the intervals are sharp and effectively capture predictive uncertainty.
arXiv Detail & Related papers (2025-10-18T17:58:17Z)
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction [13.958280616597385]
This work presents the first framework to analyze the uncertainty by offering a prediction interval of LLM-based scoring via conformal prediction.<n>We perform extensive experiments and analysis, which show that conformal prediction can provide valid prediction interval with coverage guarantees.
arXiv Detail & Related papers (2025-09-23T05:26:28Z)
Uncertainty-Driven Reliability: Selective Prediction and Trustworthy Deployment in Modern Machine Learning [1.2183405753834562]
This thesis investigates how uncertainty estimation can enhance the safety and trustworthiness of machine learning (ML) systems.<n>We first show that a model's training trajectory contains rich uncertainty signals that can be exploited without altering its architecture or loss.<n>We propose a lightweight, post-hoc abstention method that works across tasks, avoids the cost of deep ensembles, and achieves state-of-the-art selective prediction performance.
arXiv Detail & Related papers (2025-08-11T02:33:53Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation [2.6912673131004468]
We introduce the Optimal Transport (OT) score, a confidence metric derived from a novel theoretical analysis.<n> OT score is intuitively interpretable and theoretically rigorous.<n>It provides principled uncertainty estimates for any given set of target pseudo-labels.<n>It improves SFUDA performance through training-time reweighting and provides a reliable, label-free proxy for model performance.
arXiv Detail & Related papers (2025-05-16T20:09:05Z)
SConU: Selective Conformal Uncertainty in Large Language Models [59.25881667640868]
We propose a novel approach termed Selective Conformal Uncertainty (SConU)<n>We develop two conformal p-values that are instrumental in determining whether a given sample deviates from the uncertainty distribution of the calibration set at a specific manageable risk level.<n>Our approach not only facilitates rigorous management of miscoverage rates across both single-domain and interdisciplinary contexts, but also enhances the efficiency of predictions.
arXiv Detail & Related papers (2025-04-19T03:01:45Z)
Equal Opportunity of Coverage in Fair Regression [50.76908018786335]
We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. We propose Equal Opportunity of Coverage (EOC) that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level.
arXiv Detail & Related papers (2023-11-03T21:19:59Z)
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework [8.572441599469597]
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov decision processes. The objective is to establish a confidence interval (CI) for the target policy value using only offline data pre-collected from unknown behavior policies. We show that our algorithm is sample-efficient, error-robust, and provably convergent even in non-linear function approximation settings.
arXiv Detail & Related papers (2023-09-23T06:35:44Z)
When Does Confidence-Based Cascade Deferral Suffice? [69.28314307469381]
Cascades are a classical strategy to enable inference cost to vary adaptively across samples. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. Despite being oblivious to the structure of the cascade, confidence-based deferral often works remarkably well in practice.
arXiv Detail & Related papers (2023-07-06T04:13:57Z)
AUC-based Selective Classification [5.406386303264086]
We propose a model-agnostic approach to associate a selection function to a given binary classifier. We provide both theoretical justifications and a novel algorithm, called $AUCross$, to achieve such a goal. Experiments show that $AUCross$ succeeds in trading-off coverage for AUC, improving over existing selective classification methods targeted at optimizing accuracy.
arXiv Detail & Related papers (2022-10-19T16:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.