Related papers: Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection

Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection

URL: http://arxiv.org/abs/2308.03179v1
Date: Sun, 6 Aug 2023 18:11:42 GMT
Title: Empirical Optimal Risk to Quantify Model Trustworthiness for Failure Detection
Authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
Abstract summary: Failure detection in AI systems is a crucial safeguard for the deployment for safety-critical tasks. The Risk-coverage curve (RC) reveals the trade-off between the data coverage rate and the performance on accepted data. We propose the Excess Area Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the optimal point to the full coverage.
Score: 1.192436948211501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Failure detection (FD) in AI systems is a crucial safeguard for the deployment for safety-critical tasks. The common evaluation method of FD performance is the Risk-coverage (RC) curve, which reveals the trade-off between the data coverage rate and the performance on accepted data. One common way to quantify the RC curve by calculating the area under the RC curve. However, this metric does not inform on how suited any method is for FD, or what the optimal coverage rate should be. As FD aims to achieve higher performance with fewer data discarded, evaluating with partial coverage excluding the most uncertain samples is more intuitive and meaningful than full coverage. In addition, there is an optimal point in the coverage where the model could achieve ideal performance theoretically. We propose the Excess Area Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the optimal point to the full coverage. Further, the model performance at this optimal point can represent both model learning ability and calibration. We propose it as the Trust Index (TI), a complementary evaluation metric to the overall model accuracy. We report extensive experiments on three benchmark image datasets with ten variants of transformer and CNN models. Our results show that our proposed methods can better reflect the model trustworthiness than existing evaluation metrics. We further observe that the model with high overall accuracy does not always yield the high TI, which indicates the necessity of the proposed Trust Index as a complementary metric to the model overall accuracy. The code are available at \url{https://github.com/AoShuang92/optimal_risk}.

Related papers

Conformal Prediction for Indoor Positioning with Correctness Coverage Guarantees [0.4779196219827508]
This paper applies conformal prediction (CP) to deep learning-based indoor positioning.<n>CP transforms the uncertainty of the model into a non-conformity score, constructs prediction sets to ensure correctness coverage, and provides statistical guarantees.<n>The model achieved an accuracy of approximately 100% on the training dataset and 85% on the testing dataset, effectively demonstrating its performance and generalization capability.
arXiv Detail & Related papers (2025-05-03T12:45:08Z)
UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation [93.38604803625294]
We present UncertaintyRAG, a novel approach for long-context Retrieval-Augmented Generation (RAG) We use Signal-to-Noise Ratio (SNR)-based span uncertainty to estimate similarity between text chunks. UncertaintyRAG outperforms baselines by 2.03% on LLaMA-2-7B, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-10-03T17:39:38Z)
Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions [22.765095010254118]
The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. In order to mitigate over-conservative estimates while enhancing out-of-sample performance, we study DRO estimators informed by semi-parametric max-stable constraints in the space of point processes. Both approaches are validated using synthetically generated data, recovering prescribed characteristics, and verifying the efficacy of the proposed techniques.
arXiv Detail & Related papers (2024-07-31T19:45:27Z)
PAC-Bayes Generalization Certificates for Learned Inductive Conformal Prediction [27.434939269672288]
We use PAC-Bayes theory to obtain generalization bounds on the coverage and the efficiency of set-valued predictors. We leverage these theoretical results to provide a practical algorithm for using calibration data to fine-tune the parameters of a model and score function. We evaluate the approach on regression and classification tasks, and outperform baselines calibrated using a Hoeffding bound-based PAC guarantee on ICP.
arXiv Detail & Related papers (2023-12-07T19:40:44Z)
Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization [12.79542334840646]
Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios. We propose a novel fine-grained reward (FGRM) framework to address uncertainty estimation. Our method outperforms state-of-the-art methods by a clear margin on all the calibration metrics of uncertainty estimation.
arXiv Detail & Related papers (2023-11-05T17:43:37Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning [2.4578723416255754]
Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty.<n>We propose a DRO-based method for linear regression and address a central question: how to optimally choose the robustness radius.<n>We show that our method achieves the same effect as cross-validation, but at a fraction of the computational cost.
arXiv Detail & Related papers (2022-06-27T13:02:59Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations. We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety. We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z)
PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences. We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction. Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z)
A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation. We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z)
Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map. We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty. Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.