Theoretical analysis and experimental validation of volume bias of soft
Dice optimized segmentation maps in the context of inherent uncertainty
- URL: http://arxiv.org/abs/2211.04161v1
- Date: Tue, 8 Nov 2022 11:04:52 GMT
- Title: Theoretical analysis and experimental validation of volume bias of soft
Dice optimized segmentation maps in the context of inherent uncertainty
- Authors: Jeroen Bertels, David Robben, Dirk Vandermeulen, Paul Suetens
- Abstract summary: Recent segmentation methods use a differentiable surrogate metric, such as soft Dice, as part of the loss function during the learning phase.
We first briefly describe how to derive volume estimates from a segmentation that is, potentially, inherently uncertain or ambiguous.
We find that, even though soft Dice optimization leads to an improved performance with respect to the Dice score and other measures, it may introduce a volume bias for tasks with high inherent uncertainty.
- Score: 6.692460499366963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The clinical interest is often to measure the volume of a structure, which is
typically derived from a segmentation. In order to evaluate and compare
segmentation methods, the similarity between a segmentation and a predefined
ground truth is measured using popular discrete metrics, such as the Dice
score. Recent segmentation methods use a differentiable surrogate metric, such
as soft Dice, as part of the loss function during the learning phase. In this
work, we first briefly describe how to derive volume estimates from a
segmentation that is, potentially, inherently uncertain or ambiguous. This is
followed by a theoretical analysis and an experimental validation linking the
inherent uncertainty to common loss functions for training CNNs, namely
cross-entropy and soft Dice. We find that, even though soft Dice optimization
leads to an improved performance with respect to the Dice score and other
measures, it may introduce a volume bias for tasks with high inherent
uncertainty. These findings indicate some of the method's clinical limitations
and suggest doing a closer ad-hoc volume analysis with an optional
re-calibration step.
Related papers
- Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks [60.80828925396154]
We present Connected-Component(CC)-Metrics, a novel semantic segmentation evaluation protocol.
We motivate this setup in the common medical scenario of semantic segmentation in a full-body PET/CT.
We show how existing semantic segmentation metrics suffer from a bias towards larger connected components.
arXiv Detail & Related papers (2024-10-24T12:26:05Z) - A Generalization Theory of Cross-Modality Distillation with Contrastive Learning [49.35244441141323]
Cross-modality distillation arises as an important topic for data modalities containing limited knowledge.
We formulate a general framework of cross-modality contrastive distillation (CMCD), built upon contrastive learning.
Our algorithm outperforms existing algorithms consistently by a margin of 2-3% across diverse modalities and tasks.
arXiv Detail & Related papers (2024-05-06T11:05:13Z) - Segmentation Quality and Volumetric Accuracy in Medical Imaging [0.9426448361599084]
Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard.
While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement.
We utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks.
arXiv Detail & Related papers (2024-04-27T00:49:39Z) - Marginal Thresholding in Noisy Image Segmentation [3.609538870261841]
It is shown that optimal solutions to the loss functions soft-Dice and cross-entropy diverge as the level of noise increases.
This raises the question whether the decrease in performance seen when using cross-entropy as compared to soft-Dice is caused by using the wrong threshold.
arXiv Detail & Related papers (2023-04-08T22:27:36Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture
of Stochastic Expert [24.216869988183092]
We focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images.
We propose a novel mixture of experts (MoSE) model, where each expert network estimates a distinct mode of aleatoric uncertainty.
We develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations.
arXiv Detail & Related papers (2022-12-14T16:48:21Z) - The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty.
We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Segmentation with Multiple Acceptable Annotations: A Case Study of
Myocardial Segmentation in Contrast Echocardiography [12.594060034146125]
We propose a new extended Dice to evaluate segmentation performance when multiple accepted ground truth is available.
We then solve the second problem by further incorporating the new metric into a loss function that enables neural networks to learn general features of myocardium.
Experiment results on our clinical MCE data set demonstrate that the neural network trained with the proposed loss function outperforms those existing ones.
arXiv Detail & Related papers (2021-06-29T17:32:24Z) - Optimization for Medical Image Segmentation: Theory and Practice when
evaluating with Dice Score or Jaccard Index [25.04858968806884]
We investigate the relation within the group of metric-sensitive loss functions.
We find that the Dice score and Jaccard index approximate each other relatively and absolutely.
We verify these results empirically in an extensive validation on six medical segmentation tasks.
arXiv Detail & Related papers (2020-10-26T11:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.