Usable Region Estimate for Assessing Practical Usability of Medical
Image Segmentation Models
- URL: http://arxiv.org/abs/2207.00156v1
- Date: Fri, 1 Jul 2022 02:33:44 GMT
- Title: Usable Region Estimate for Assessing Practical Usability of Medical
Image Segmentation Models
- Authors: Yizhe Zhang, Suraj Mishra, Peixian Liang, Hao Zheng and Danny Z. Chen
- Abstract summary: We aim to quantitatively measure the practical usability of medical image segmentation models.
We first propose a measure, Correctness-Confidence Rank Correlation (CCRC), to capture how predictions' confidence estimates correlate with their correctness scores in rank.
We then propose Usable Region Estimate (URE), which simultaneously quantifies predictions' correctness and reliability of confidence assessments in one estimate.
- Score: 32.56957759180135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We aim to quantitatively measure the practical usability of medical image
segmentation models: to what extent, how often, and on which samples a model's
predictions can be used/trusted. We first propose a measure,
Correctness-Confidence Rank Correlation (CCRC), to capture how predictions'
confidence estimates correlate with their correctness scores in rank. A model
with a high value of CCRC means its prediction confidences reliably suggest
which samples' predictions are more likely to be correct. Since CCRC does not
capture the actual prediction correctness, it alone is insufficient to indicate
whether a prediction model is both accurate and reliable to use in practice.
Therefore, we further propose another method, Usable Region Estimate (URE),
which simultaneously quantifies predictions' correctness and reliability of
confidence assessments in one estimate. URE provides concrete information on to
what extent a model's predictions are usable. In addition, the sizes of usable
regions (UR) can be utilized to compare models: A model with a larger UR can be
taken as a more usable and hence better model. Experiments on six datasets
validate that the proposed evaluation methods perform well, providing a
concrete and concise measure for the practical usability of medical image
segmentation models. Code is made available at
https://github.com/yizhezhang2000/ure.
Related papers
- Confidence-based Estimators for Predictive Performance in Model Monitoring [0.5399800035598186]
After a machine learning model has been deployed into production, its predictive performance needs to be monitored.
Recently, novel methods for estimating the predictive performance of a model when ground truth is unavailable have been developed.
We show that under certain general assumptions, the Average Confidence (AC) method is an unbiased and consistent estimator of model accuracy.
arXiv Detail & Related papers (2024-07-11T16:28:31Z) - Confidence and Dispersity Speak: Characterising Prediction Matrix for
Unsupervised Accuracy Estimation [51.809741427975105]
This work aims to assess how well a model performs under distribution shifts without using labels.
We use the nuclear norm that has been shown to be effective in characterizing both properties.
We show that the nuclear norm is more accurate and robust in accuracy than existing methods.
arXiv Detail & Related papers (2023-02-02T13:30:48Z) - Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties.
We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model.
We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences.
We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction.
Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z) - Learning to Predict with Supporting Evidence: Applications to Clinical
Risk Prediction [9.199022926064009]
The impact of machine learning models on healthcare will depend on the degree of trust that healthcare professionals place in the predictions made by these models.
We present a method to provide people with clinical expertise with domain-relevant evidence about why a prediction should be trusted.
arXiv Detail & Related papers (2021-03-04T00:26:32Z) - Learning Accurate Dense Correspondences and When to Trust Them [161.76275845530964]
We aim to estimate a dense flow field relating two images, coupled with a robust pixel-wise confidence map.
We develop a flexible probabilistic approach that jointly learns the flow prediction and its uncertainty.
Our approach obtains state-of-the-art results on challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-01-05T18:54:11Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - Robust Validation: Confident Predictions Even When Distributions Shift [19.327409270934474]
We describe procedures for robust predictive inference, where a model provides uncertainty estimates on its predictions rather than point predictions.
We present a method that produces prediction sets (almost exactly) giving the right coverage level for any test distribution in an $f$-divergence ball around the training population.
An essential component of our methodology is to estimate the amount of expected future data shift and build robustness to it.
arXiv Detail & Related papers (2020-08-10T17:09:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.