Empirical Optimal Risk to Quantify Model Trustworthiness for Failure
Detection
- URL: http://arxiv.org/abs/2308.03179v1
- Date: Sun, 6 Aug 2023 18:11:42 GMT
- Title: Empirical Optimal Risk to Quantify Model Trustworthiness for Failure
Detection
- Authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
- Abstract summary: Failure detection in AI systems is a crucial safeguard for the deployment for safety-critical tasks.
The Risk-coverage curve (RC) reveals the trade-off between the data coverage rate and the performance on accepted data.
We propose the Excess Area Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the optimal point to the full coverage.
- Score: 1.192436948211501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Failure detection (FD) in AI systems is a crucial safeguard for the
deployment for safety-critical tasks. The common evaluation method of FD
performance is the Risk-coverage (RC) curve, which reveals the trade-off
between the data coverage rate and the performance on accepted data. One common
way to quantify the RC curve by calculating the area under the RC curve.
However, this metric does not inform on how suited any method is for FD, or
what the optimal coverage rate should be. As FD aims to achieve higher
performance with fewer data discarded, evaluating with partial coverage
excluding the most uncertain samples is more intuitive and meaningful than full
coverage. In addition, there is an optimal point in the coverage where the
model could achieve ideal performance theoretically. We propose the Excess Area
Under the Optimal RC Curve (E-AUoptRC), with the area in coverage from the
optimal point to the full coverage. Further, the model performance at this
optimal point can represent both model learning ability and calibration. We
propose it as the Trust Index (TI), a complementary evaluation metric to the
overall model accuracy. We report extensive experiments on three benchmark
image datasets with ten variants of transformer and CNN models. Our results
show that our proposed methods can better reflect the model trustworthiness
than existing evaluation metrics. We further observe that the model with high
overall accuracy does not always yield the high TI, which indicates the
necessity of the proposed Trust Index as a complementary metric to the model
overall accuracy. The code are available at
\url{https://github.com/AoShuang92/optimal_risk}.
Related papers
- PAC-Bayes Generalization Certificates for Learned Inductive Conformal
Prediction [27.434939269672288]
We use PAC-Bayes theory to obtain generalization bounds on the coverage and the efficiency of set-valued predictors.
We leverage these theoretical results to provide a practical algorithm for using calibration data to fine-tune the parameters of a model and score function.
We evaluate the approach on regression and classification tasks, and outperform baselines calibrated using a Hoeffding bound-based PAC guarantee on ICP.
arXiv Detail & Related papers (2023-12-07T19:40:44Z) - Uncertainty Estimation for Safety-critical Scene Segmentation via
Fine-grained Reward Maximization [12.79542334840646]
Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios.
We propose a novel fine-grained reward (FGRM) framework to address uncertainty estimation.
Our method outperforms state-of-the-art methods by a clear margin on all the calibration metrics of uncertainty estimation.
arXiv Detail & Related papers (2023-11-05T17:43:37Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations [50.37808220291108]
This paper addresses learning safe output feedback control laws from partial observations of expert demonstrations.
We first propose robust output control barrier functions (ROCBFs) as a means to guarantee safety.
We then formulate an optimization problem to learn ROCBFs from expert demonstrations that exhibit safe system behavior.
arXiv Detail & Related papers (2021-11-18T23:21:00Z) - PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences.
We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction.
Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z) - Stochastic Optimization of Areas Under Precision-Recall Curves with
Provable Convergence [66.83161885378192]
Area under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems.
We propose a technical method to optimize AUPRC for deep learning.
arXiv Detail & Related papers (2021-04-18T06:22:21Z) - A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation.
We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z) - Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map.
We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty.
Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z) - Increasing Trustworthiness of Deep Neural Networks via Accuracy
Monitoring [20.456742449675904]
Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets.
This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications.
We propose a neural network-based accuracy monitor model, which only takes the deployed DNN's softmax probability output as its input.
arXiv Detail & Related papers (2020-07-03T03:09:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.