Comparing Bayesian Models for Organ Contouring in Headand Neck
Radiotherapy
- URL: http://arxiv.org/abs/2111.01134v1
- Date: Mon, 1 Nov 2021 14:46:25 GMT
- Title: Comparing Bayesian Models for Organ Contouring in Headand Neck
Radiotherapy
- Authors: Prerak Mody, Nicolas Chaves-de-Plaza, Klaus Hildebrandt, Rene van
Egmond, Huib de Ridder, Marius Staring
- Abstract summary: We investigate two Bayesian models for auto-contouring, DropOut and FlipOut, using a quantitative measure - expected calibration error (ECE) and a qualitative measure - region-based accuracy-vs-uncertainty (R-AvU) graphs.
We show that DropOut-DICE has the highest ECE, while Dropout-CE and FlipOut-CE have the lowest ECE.
Experiments are conducted on the MICCAI2015 Head and Neck Challenge and on the DeepMindTCIA CT dataset using three models: DropOut-DICE, Dropout-CE and FlipOut-CE
- Score: 6.499117567077562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models for organ contouring in radiotherapy are poised for
clinical usage, but currently, there exist few tools for automated quality
assessment (QA) of the predicted contours. Using Bayesian models and their
associated uncertainty, one can potentially automate the process of detecting
inaccurate predictions. We investigate two Bayesian models for auto-contouring,
DropOut and FlipOut, using a quantitative measure - expected calibration error
(ECE) and a qualitative measure - region-based accuracy-vs-uncertainty (R-AvU)
graphs. It is well understood that a model should have low ECE to be considered
trustworthy. However, in a QA context, a model should also have high
uncertainty in inaccurate regions and low uncertainty in accurate regions. Such
behaviour could direct visual attention of expert users to potentially
inaccurate regions, leading to a speed up in the QA process. Using R-AvU
graphs, we qualitatively compare the behaviour of different models in accurate
and inaccurate regions. Experiments are conducted on the MICCAI2015 Head and
Neck Segmentation Challenge and on the DeepMindTCIA CT dataset using three
models: DropOut-DICE, Dropout-CE (Cross Entropy) and FlipOut-CE. Quantitative
results show that DropOut-DICE has the highest ECE, while Dropout-CE and
FlipOut-CE have the lowest ECE. To better understand the difference between
DropOut-CE and FlipOut-CE, we use the R-AvU graph which shows that FlipOut-CE
has better uncertainty coverage in inaccurate regions than DropOut-CE. Such a
combination of quantitative and qualitative metrics explores a new approach
that helps to select which model can be deployed as a QA tool in clinical
settings.
Related papers
- Improving Uncertainty-Error Correspondence in Deep Bayesian Medical Image Segmentation [3.3572047447192626]
We train the FlipOut model with the Accuracy-vs-Uncertainty (AvU) loss which promotes uncertainty to be present only in inaccurate regions.
We apply this method on datasets of two radiotherapy body sites, c.f. head-and-neck CT and prostate MR scans.
arXiv Detail & Related papers (2024-09-05T12:31:51Z) - RICA2: Rubric-Informed, Calibrated Assessment of Actions [8.641411594566714]
We present RICA2 - a deep probabilistic model that score rubric and accounts for prediction uncertainty for action quality assessment (AQA)
We demonstrate that our method establishes new state of the art on public benchmarks, including FineDiving, MTL-AQA, and JIGSAWS, with superior performance in score prediction and uncertainty calibration.
arXiv Detail & Related papers (2024-08-04T20:35:33Z) - Schroedinger's Threshold: When the AUC doesn't predict Accuracy [6.091702876917282]
Area Under Curve measure (AUC) seems apt to evaluate and compare diverse models.
We show that the AUC yields an academic and optimistic notion of accuracy that can misalign with the actual accuracy observed in application.
arXiv Detail & Related papers (2024-04-04T10:18:03Z) - Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with
Differentiable Expected Calibration Error [50.86671887712424]
The prevalence of domain adaptive semantic segmentation has prompted concerns regarding source domain data leakage.
To circumvent the requirement for source data, source-free domain adaptation has emerged as a viable solution.
We propose a novel calibration-guided source-free domain adaptive semantic segmentation framework.
arXiv Detail & Related papers (2023-08-06T03:28:34Z) - Proximity-Informed Calibration for Deep Neural Networks [49.330703634912915]
ProCal is a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity.
We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings.
arXiv Detail & Related papers (2023-06-07T16:40:51Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Unlabelled Data Improves Bayesian Uncertainty Calibration under
Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation.
We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.