Uncertainty-Aware Segmentation Quality Prediction via Deep Learning Bayesian Modeling: Comprehensive Evaluation and Interpretation on Skin Cancer and Liver Segmentation
- URL: http://arxiv.org/abs/2508.01460v1
- Date: Sat, 02 Aug 2025 18:30:32 GMT
- Title: Uncertainty-Aware Segmentation Quality Prediction via Deep Learning Bayesian Modeling: Comprehensive Evaluation and Interpretation on Skin Cancer and Liver Segmentation
- Authors: Sikha O K, Meritxell Riera-Marín, Adrian Galdran, Javier García Lopez, Julia Rodríguez-Comas, Gemma Piella, Miguel A. González Ballester,
- Abstract summary: We propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time.<n>Our framework achieves an R2 score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset.<n>For 3D liver segmentation, Test Time Augmentation with entropy achieves an R2 score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness.
- Score: 1.428446217085158
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image segmentation is a critical step in computational biomedical image analysis, typically evaluated using metrics like the Dice coefficient during training and validation. However, in clinical settings without manual annotations, assessing segmentation quality becomes challenging, and models lacking reliability indicators face adoption barriers. To address this gap, we propose a novel framework for predicting segmentation quality without requiring ground truth annotations during test time. Our approach introduces two complementary frameworks: one leveraging predicted segmentation and uncertainty maps, and another integrating the original input image, uncertainty maps, and predicted segmentation maps. We present Bayesian adaptations of two benchmark segmentation models-SwinUNet and Feature Pyramid Network with ResNet50-using Monte Carlo Dropout, Ensemble, and Test Time Augmentation to quantify uncertainty. We evaluate four uncertainty estimates: confidence map, entropy, mutual information, and expected pairwise Kullback-Leibler divergence on 2D skin lesion and 3D liver segmentation datasets, analyzing their correlation with segmentation quality metrics. Our framework achieves an R2 score of 93.25 and Pearson correlation of 96.58 on the HAM10000 dataset, outperforming previous segmentation quality assessment methods. For 3D liver segmentation, Test Time Augmentation with entropy achieves an R2 score of 85.03 and a Pearson correlation of 65.02, demonstrating cross-modality robustness. Additionally, we propose an aggregation strategy that combines multiple uncertainty estimates into a single score per image, offering a more robust and comprehensive assessment of segmentation quality. Finally, we use Grad-CAM and UMAP-based embedding analysis to interpret the model's behavior and reliability, highlighting the impact of uncertainty integration.
Related papers
- Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images [22.36128130052757]
We build a ground-truth-free evaluation model to assess the quality of segmentations generated by the Segment Anything Model (SAM) and its variants in medical imaging.
This evaluation model estimates segmentation quality scores by analyzing the coherence and consistency between the input images and their corresponding segmentation predictions.
arXiv Detail & Related papers (2024-09-23T10:12:08Z) - Towards Better Certified Segmentation via Diffusion Models [62.21617614504225]
segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving.
Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees.
In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models.
arXiv Detail & Related papers (2023-06-16T16:30:39Z) - A quality assurance framework for real-time monitoring of deep learning
segmentation models in radiotherapy [3.5752677591512487]
This work uses cardiac substructure segmentation as an example task to establish a quality assurance framework.
A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients was collected.
An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features.
A regression model was trained to predict the per-patient segmentation accuracy, measured by Dice similarity coefficient (DSC)
arXiv Detail & Related papers (2023-05-19T14:51:05Z) - Evaluating Probabilistic Classifiers: The Triptych [62.997667081978825]
We propose and study a triptych of diagnostic graphics that focus on distinct and complementary aspects of forecast performance.
The reliability diagram addresses calibration, the receiver operating characteristic (ROC) curve diagnoses discrimination ability, and the Murphy diagram visualizes overall predictive performance and value.
arXiv Detail & Related papers (2023-01-25T19:35:23Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Trustworthy Medical Segmentation with Uncertainty Estimation [0.7829352305480285]
This paper introduces a new Bayesian deep learning framework for uncertainty quantification in segmentation neural networks.
We evaluate the proposed framework on medical image segmentation data from Magnetic Resonances Imaging and Computed Tomography scans.
Our experiments on multiple benchmark datasets demonstrate that the proposed framework is more robust to noise and adversarial attacks as compared to state-of-the-art segmentation models.
arXiv Detail & Related papers (2021-11-10T22:46:05Z) - Inconsistency-aware Uncertainty Estimation for Semi-supervised Medical
Image Segmentation [92.9634065964963]
We present a new semi-supervised segmentation model, namely, conservative-radical network (CoraNet) based on our uncertainty estimation and separate self-training strategy.
Compared with the current state of the art, our CoraNet has demonstrated superior performance.
arXiv Detail & Related papers (2021-10-17T08:49:33Z) - Uncertainty Quantification in Medical Image Segmentation with
Multi-decoder U-Net [3.961279440272763]
We exploit the medical image segmentation uncertainty by measuring segmentation performance with multiple annotations in a supervised learning manner.
We propose a U-Net based architecture with multiple decoders, where the image representation is encoded with the same encoder, and segmentation referring to each annotation is estimated with multiple decoders.
The proposed architecture is trained in an end-to-end manner and able to improve predictive uncertainty estimates.
arXiv Detail & Related papers (2021-09-15T01:46:29Z) - An Uncertainty-Driven GCN Refinement Strategy for Organ Segmentation [53.425900196763756]
We propose a segmentation refinement method based on uncertainty analysis and graph convolutional networks.
We employ the uncertainty levels of the convolutional network in a particular input volume to formulate a semi-supervised graph learning problem.
We show that our method outperforms the state-of-the-art CRF refinement method by improving the dice score by 1% for the pancreas and 2% for spleen.
arXiv Detail & Related papers (2020-12-06T18:55:07Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.