Related papers: From Misclassifications to Outliers: Joint Reliability Assessment in Classification

From Misclassifications to Outliers: Joint Reliability Assessment in Classification

URL: http://arxiv.org/abs/2603.03903v1
Date: Wed, 04 Mar 2026 10:11:51 GMT
Title: From Misclassifications to Outliers: Joint Reliability Assessment in Classification
Authors: Yang Li, Youyang Sha, Yinzhi Wang, Timothy Hospedales, Xi Shen, Shell Xu Hu, Xuanlong Yu,
Abstract summary: A reliable system should not only detect out-of-distribution (OOD) inputs but also anticipate in-distribution (ID) errors.<n>We propose a unified evaluation framework that integrates OOD detection and failure prediction.<n>We introduce SURE+, a new approach that significantly improves reliability across diverse scenarios.
Score: 13.95428115564986
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Building reliable classifiers is a fundamental challenge for deploying machine learning in real-world applications. A reliable system should not only detect out-of-distribution (OOD) inputs but also anticipate in-distribution (ID) errors by assigning low confidence to potentially misclassified samples. Yet, most prior work treats OOD detection and failure prediction as separated problems, overlooking their closed connection. We argue that reliability requires evaluating them jointly. To this end, we propose a unified evaluation framework that integrates OOD detection and failure prediction, quantified by our new metrics DS-F1 and DS-AURC, where DS denotes double scoring functions. Experiments on the OpenOOD benchmark show that double scoring functions yield classifiers that are substantially more reliable than traditional single scoring approaches. Our analysis further reveals that OOD-based approaches provide notable gains under simple or far-OOD shifts, but only marginal benefits under more challenging near-OOD conditions. Beyond evaluation, we extend the reliable classifier SURE and introduce SURE+, a new approach that significantly improves reliability across diverse scenarios. Together, our framework, metrics, and method establish a new benchmark for trustworthy classification and offer practical guidance for deploying robust models in real-world settings. The source code is publicly available at https://github.com/Intellindust-AI-Lab/SUREPlus.

Related papers

Revisiting Logit Distributions for Reliable Out-of-Distribution Detection [73.9121001113687]
Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning models in open-world applications.<n>LogitGap is a novel post-hoc OOD detection method that exploits the relationship between the maximum logit and the remaining logits.<n>We show that LogitGap consistently achieves state-of-the-art performance across diverse OOD detection scenarios and benchmarks.
arXiv Detail & Related papers (2025-10-23T02:16:45Z)
TrustLoRA: Low-Rank Adaptation for Failure Detection under Out-of-distribution Data [62.22804234013273]
We propose a simple failure detection framework to unify and facilitate classification with rejection under both covariate and semantic shifts.<n>Our key insight is that by separating and consolidating failure-specific reliability knowledge with low-rank adapters, we can enhance the failure detection ability effectively and flexibly.
arXiv Detail & Related papers (2025-04-20T09:20:55Z)
What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z)
Margin-bounded Confidence Scores for Out-of-Distribution Detection [2.373572816573706]
We propose a novel method called Margin bounded Confidence Scores (MaCS) to address the nontrivial OOD detection problem. MaCS enlarges the disparity between ID and OOD scores, which in turn makes the decision boundary more compact. Experiments on various benchmark datasets for image classification tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-09-22T05:40:25Z)
Revisiting Confidence Estimation: Towards Reliable Failure Prediction [53.79160907725975]
We find a general, widely existing but actually-neglected phenomenon that most confidence estimation methods are harmful for detecting misclassification errors. We propose to enlarge the confidence gap by finding flat minima, which yields state-of-the-art failure prediction performance.
arXiv Detail & Related papers (2024-03-05T11:44:14Z)
Distilling the Unknown to Unveil Certainty [66.29929319664167]
Out-of-distribution (OOD) detection is critical for identifying test samples that deviate from in-distribution (ID) data, ensuring network robustness and reliability.<n>This paper presents a flexible framework for OOD knowledge distillation that extracts OOD-sensitive information from a network to develop a binary classifier capable of distinguishing between ID and OOD samples.
arXiv Detail & Related papers (2023-11-14T08:05:02Z)
Free Lunch for Generating Effective Outlier Supervision [46.37464572099351]
We propose an ultra-effective method to generate near-realistic outlier supervision. Our proposed textttBayesAug significantly reduces the false positive rate over 12.50% compared with the previous schemes.
arXiv Detail & Related papers (2023-01-17T01:46:45Z)
Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data. In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier. In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.