Average Calibration Error: A Differentiable Loss for Improved
Reliability in Image Segmentation
- URL: http://arxiv.org/abs/2403.06759v1
- Date: Mon, 11 Mar 2024 14:31:03 GMT
- Title: Average Calibration Error: A Differentiable Loss for Improved
Reliability in Image Segmentation
- Authors: Theodore Barfoot and Luis Garcia-Peraza-Herrera and Ben Glocker and
Tom Vercauteren
- Abstract summary: We propose to use marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss function to improve pixel-wise calibration without compromising segmentation quality.
We show that this loss, despite using hard binning, is directly differentiable, bypassing the need for approximate but differentiable surrogate or soft binning approaches.
- Score: 17.263160921956445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks for medical image segmentation often produce
overconfident results misaligned with empirical observations. Such
miscalibration, challenges their clinical translation. We propose to use
marginal L1 average calibration error (mL1-ACE) as a novel auxiliary loss
function to improve pixel-wise calibration without compromising segmentation
quality. We show that this loss, despite using hard binning, is directly
differentiable, bypassing the need for approximate but differentiable surrogate
or soft binning approaches. Our work also introduces the concept of dataset
reliability histograms which generalises standard reliability diagrams for
refined visual assessment of calibration in semantic segmentation aggregated at
the dataset level. Using mL1-ACE, we reduce average and maximum calibration
error by 45% and 55% respectively, maintaining a Dice score of 87% on the BraTS
2021 dataset. We share our code here: https://github.com/cai4cai/ACE-DLIRIS
Related papers
- We Care Each Pixel: Calibrating on Medical Segmentation Model [15.826029150910566]
pixel-wise Expected Error (pECE) is a novel metric that measures miscalibration at the pixel level.
We also introduce a morphological adaptation strategy that applies morphological operations to ground-truth masks before computing calibration losses.
Our method not only enhances segmentation performance but also improves calibration quality, yielding more trustworthy confidence estimates.
arXiv Detail & Related papers (2025-03-07T03:06:03Z) - Orthogonal Causal Calibration [55.28164682911196]
We develop general algorithms for reducing the task of causal calibration to that of calibrating a standard (non-causal) predictive model.
Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings.
arXiv Detail & Related papers (2024-06-04T03:35:25Z) - Image-level Regression for Uncertainty-aware Retinal Image Segmentation [3.7141182051230914]
We introduce a novel Uncertainty-Aware (SAUNA) transform, which adds pixel uncertainty to the ground truth.
Our results indicate that the integration of the SAUNA transform and these segmentation losses led to significant performance boosts for different segmentation models.
arXiv Detail & Related papers (2024-05-27T04:17:10Z) - Asymptotic Characterisation of Robust Empirical Risk Minimisation
Performance in the Presence of Outliers [18.455890316339595]
We study robust linear regression in high-dimension, when both the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $alpha=n/d$, and study a data model that includes outliers.
We provide exacts for the performances of the empirical risk minimisation (ERM) using $ell$-regularised $ell$, $ell_$, and Huber losses.
arXiv Detail & Related papers (2023-05-30T12:18:39Z) - DOMINO: Domain-aware Model Calibration in Medical Image Segmentation [51.346121016559024]
Modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability.
We propose DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels.
Our results show that DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation.
arXiv Detail & Related papers (2022-09-13T15:31:52Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Adaptation to CT Reconstruction Kernels by Enforcing Cross-domain
Feature Maps Consistency [0.06117371161379209]
We show a decrease in the COVID-19 segmentation quality of the model trained on the smooth and tested on the sharp reconstruction kernels.
We propose the unsupervised adaptation method, called F-Consistency, that outperforms the previous approaches.
arXiv Detail & Related papers (2022-03-28T10:00:03Z) - A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved
Neural Network Calibration [12.449806152650657]
We propose a novel auxiliary loss function: Multi-class Difference in Confidence and Accuracy ( MDCA )
We show that training with MDCA leads to better-calibrated models in terms of Expected Error ( ECE), and Static Error ( SCE ) on image classification, and segmentation tasks.
arXiv Detail & Related papers (2022-03-25T18:02:13Z) - Localized Calibration: Metrics and Recalibration [133.07044916594361]
We propose a fine-grained calibration metric that spans the gap between fully global and fully individualized calibration.
We then introduce a localized recalibration method, LoRe, that improves the LCE better than existing recalibration methods.
arXiv Detail & Related papers (2021-02-22T07:22:12Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z) - Calibration of Neural Networks using Splines [51.42640515410253]
Measuring calibration error amounts to comparing two empirical distributions.
We introduce a binning-free calibration measure inspired by the classical Kolmogorov-Smirnov (KS) statistical test.
Our method consistently outperforms existing methods on KS error as well as other commonly used calibration measures.
arXiv Detail & Related papers (2020-06-23T07:18:05Z) - Mix-n-Match: Ensemble and Compositional Methods for Uncertainty
Calibration in Deep Learning [21.08664370117846]
We show how Mix-n-Match calibration strategies can help achieve remarkably better data-efficiency and expressive power.
We also reveal potential issues in standard evaluation practices.
Our approaches outperform state-of-the-art solutions on both the calibration as well as the evaluation tasks.
arXiv Detail & Related papers (2020-03-16T17:00:35Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.