Confidence Intervals for Performance Estimates in Brain MRI Segmentation
- URL: http://arxiv.org/abs/2307.10926v3
- Date: Sun, 23 Mar 2025 18:22:12 GMT
- Title: Confidence Intervals for Performance Estimates in Brain MRI Segmentation
- Authors: R. El Jurdi, G. Varoquaux, O. Colliot,
- Abstract summary: We study the typical confidence intervals in the context of segmentation in 3D brain magnetic resonance imaging (MRI)<n>We show that the test size needed to achieve a given precision is often much lower than for classification tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in the context of segmentation in 3D brain magnetic resonance imaging (MRI). We carry experiments on using the standard nnU-net framework, two datasets from the Medical Decathlon challenge that concern brain MRI (hippocampus and brain tumor segmentation) and two performance measures: the Dice Similarity Coefficient and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1\% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3\%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.
Related papers
- Robust Conformal Volume Estimation in 3D Medical Images [0.5799785223420274]
Volumetry is one of the principal downstream applications of 3D medical image segmentation.
We propose an efficient approach for density ratio estimation relying on the compressed latent representations generated by the segmentation model.
arXiv Detail & Related papers (2024-07-29T12:18:07Z) - Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation [58.77994391566484]
We propose W1KP, a human-calibrated measure of variability in a set of images.
Our best perceptual distance outperforms nine baselines by up to 18 points in accuracy.
We analyze 56 linguistic features of real prompts, finding that the prompt's length, CLIP embedding norm, concreteness, and word senses influence variability most.
arXiv Detail & Related papers (2024-06-12T17:59:27Z) - Medical Image Segmentation with InTEnt: Integrated Entropy Weighting for
Single Image Test-Time Adaptation [6.964589353845092]
Test-time adaptation (TTA) refers to adapting a trained model to a new domain during testing.
Here, we propose to adapt a medical image segmentation model with only a single unlabeled test image.
Our method, validated on 24 source/target domain splits across 3 medical image datasets surpasses the leading method by 2.9% Dice coefficient on average.
arXiv Detail & Related papers (2024-02-14T22:26:07Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - How precise are performance estimates for typical medical image
segmentation tasks? [3.606795745041439]
In this paper, we aim to estimate what is the typical confidence that can be expected in medical image segmentation studies.
We extensively study precision estimation using both Gaussian assumption and bootstrapping.
Overall, our work shows that small test sets lead to wide confidence intervals.
arXiv Detail & Related papers (2022-10-26T12:53:15Z) - DOMINO: Domain-aware Model Calibration in Medical Image Segmentation [51.346121016559024]
Modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability.
We propose DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels.
Our results show that DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation.
arXiv Detail & Related papers (2022-09-13T15:31:52Z) - Towards Semi-Supervised Deep Facial Expression Recognition with An
Adaptive Confidence Margin [92.76372026435858]
We learn an Adaptive Confidence Margin (Ada-CM) to fully leverage all unlabeled data for semi-supervised deep facial expression recognition.
All unlabeled samples are partitioned into two subsets by comparing their confidence scores with the adaptively learned confidence margin.
Our method achieves state-of-the-art performance, especially surpassing fully-supervised baselines in a semi-supervised manner.
arXiv Detail & Related papers (2022-03-23T11:43:29Z) - Layer Ensembles: A Single-Pass Uncertainty Estimation in Deep Learning
for Segmentation [7.856209828002792]
We propose Layer Ensembles, a novel uncertainty estimation method that uses a single network and requires only a single pass to estimate predictive uncertainty of a network.
We evaluate our approach on 2D and 3D, binary and multi-class medical image segmentation tasks.
arXiv Detail & Related papers (2022-03-16T18:46:53Z) - Trustworthy Medical Segmentation with Uncertainty Estimation [0.7829352305480285]
This paper introduces a new Bayesian deep learning framework for uncertainty quantification in segmentation neural networks.
We evaluate the proposed framework on medical image segmentation data from Magnetic Resonances Imaging and Computed Tomography scans.
Our experiments on multiple benchmark datasets demonstrate that the proposed framework is more robust to noise and adversarial attacks as compared to state-of-the-art segmentation models.
arXiv Detail & Related papers (2021-11-10T22:46:05Z) - Inconsistency-aware Uncertainty Estimation for Semi-supervised Medical
Image Segmentation [92.9634065964963]
We present a new semi-supervised segmentation model, namely, conservative-radical network (CoraNet) based on our uncertainty estimation and separate self-training strategy.
Compared with the current state of the art, our CoraNet has demonstrated superior performance.
arXiv Detail & Related papers (2021-10-17T08:49:33Z) - SSEGEP: Small SEGment Emphasized Performance evaluation metric for
medical image segmentation [0.0]
"SSEGEP"(Small SEGment Emphasized Performance evaluation metric), (range : 0(Bad) to 1(Good))
"SSEGEP"(Small SEGment Emphasized Performance evaluation metric), (range : 0(Bad) to 1(Good))
Across 33 fundus images, where the largest exudate is 1.41%, and the smallest is 0.0002% of the image, the proposed metric is 30% closer to MOS, as compared to Dice Similarity Coefficient (DSC)
arXiv Detail & Related papers (2021-09-08T05:05:49Z) - Medical Instrument Segmentation in 3D US by Hybrid Constrained
Semi-Supervised Learning [62.13520959168732]
We propose a semi-supervised learning framework for instrument segmentation in 3D US.
To achieve the SSL learning, a Dual-UNet is proposed to segment the instrument.
Our proposed method achieves Dice score of about 68.6%-69.1% and the inference time of about 1 sec. per volume.
arXiv Detail & Related papers (2021-07-30T07:59:45Z) - Uncertainty Quantification using Variational Inference for Biomedical Image Segmentation [0.0]
We use an encoder decoder architecture based on variational inference techniques for segmenting brain tumour images.
We evaluate our work on the publicly available BRATS dataset using Dice Similarity Coefficient (DSC) and Intersection Over Union (IOU) as the evaluation metrics.
arXiv Detail & Related papers (2020-08-12T20:08:04Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Towards GAN Benchmarks Which Require Generalization [48.075521136623564]
We argue that estimating the function must require a large sample from the model.
We turn to neural network divergences (NNDs) which are defined in terms of a neural network trained to distinguish between distributions.
The resulting benchmarks cannot be "won" by training set memorization, while still being perceptually correlated and computable only from samples.
arXiv Detail & Related papers (2020-01-10T20:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.