Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty Metrics
- URL: http://arxiv.org/abs/2512.07224v1
- Date: Mon, 08 Dec 2025 07:06:58 GMT
- Title: Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty Metrics
- Authors: Tianyi Ren, Daniel Low, Pittra Jaengprajak, Juampablo Heras Rivera, Jacob Ruzevick, Mehmet Kurt,
- Abstract summary: Deep learning models have achieved remarkable performance in medical image segmentation.<n>The need for explainability remains critical for ensuring their acceptance and integration in clinical practice.<n>Our approach explored the use of contrast-level Shapley values to assess feature importance.
- Score: 0.05473229173811305
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Segmentation is the identification of anatomical regions of interest, such as organs, tissue, and lesions, serving as a fundamental task in computer-aided diagnosis in medical imaging. Although deep learning models have achieved remarkable performance in medical image segmentation, the need for explainability remains critical for ensuring their acceptance and integration in clinical practice, despite the growing research attention in this area. Our approach explored the use of contrast-level Shapley values, a systematic perturbation of model inputs to assess feature importance. While other studies have investigated gradient-based techniques through identifying influential regions in imaging inputs, Shapley values offer a broader, clinically aligned approach, explaining how model performance is fairly attributed to certain imaging contrasts over others. Using the BraTS 2024 dataset, we generated rankings for Shapley values for four MRI contrasts across four model architectures. Two metrics were proposed from the Shapley ranking: agreement between model and ``clinician" imaging ranking, and uncertainty quantified through Shapley ranking variance across cross-validation folds. Higher-performing cases (Dice \textgreater0.6) showed significantly greater agreement with clinical rankings. Increased Shapley ranking variance correlated with decreased performance (U-Net: $r=-0.581$). These metrics provide clinically interpretable proxies for model reliability, helping clinicians better understand state-of-the-art segmentation models.
Related papers
- Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data [76.89269238957593]
Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views.<n>We propose SegmentMIL, a transformer-based multi-view multiple-instance learning framework for patient-level stenosis classification.
arXiv Detail & Related papers (2026-02-02T13:07:52Z) - Prompt-Aware Adaptive Elastic Weight Consolidation for Continual Learning in Medical Vision-Language Models [0.0]
Medical vision-language models must preserve cross-modal alignments between medical images and clinical terminology.<n>We introduce Prompt- Aware Adaptive Elastic Weight Consolidation (PA-EWC), a novel continual learning approach.<n> Experimental results demonstrate that PA-EWC reduces catastrophic forgetting by up to 17.58% compared to baseline methods.
arXiv Detail & Related papers (2025-11-25T12:22:56Z) - From Classification to Cross-Modal Understanding: Leveraging Vision-Language Models for Fine-Grained Renal Pathology [9.268389327736735]
We model fine-grained glomerular subtyping as a clinically realistic few-shot problem.<n>We evaluate both pathology-specialized and general-purpose vision-language models under this setting.
arXiv Detail & Related papers (2025-11-15T01:44:11Z) - Expert-Guided Explainable Few-Shot Learning for Medical Image Diagnosis [2.7946918847372277]
We propose an expert-guided explainable few-shot learning framework that integrates radiologist-provided regions of interest into model training.<n>We evaluate our framework on two distinct datasets: BraTS (MRI) and VinDr-CXR (Chest X-ray)<n>Our findings demonstrate the effectiveness of incorporating expert-guided attention supervision to bridge the gap between performance and interpretability in few-shot medical image diagnosis.
arXiv Detail & Related papers (2025-09-08T05:31:37Z) - Gradient Attention Map Based Verification of Deep Convolutional Neural Networks with Application to X-ray Image Datasets [1.0208529247755187]
We propose a comprehensive verification framework that evaluates model suitability through multiple complementary strategies.<n>First, we introduce a Gradient Attention Map (GAM)-based approach that analyzes attention patterns using Grad-CAM.<n>Second, we extend verification to early convolutional feature maps, capturing structural mis-alignments missed by attention alone.<n>Third, we incorporate an additional garbage class into the classification model to explicitly reject out-of-distribution inputs.
arXiv Detail & Related papers (2025-04-29T23:41:37Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Towards a Guideline for Evaluation Metrics in Medical Image Segmentation [0.0]
This work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary.
As a summary, we propose a guideline for standardized medical image segmentation evaluation.
arXiv Detail & Related papers (2022-02-10T13:38:05Z) - Systematic Clinical Evaluation of A Deep Learning Method for Medical
Image Segmentation: Radiosurgery Application [48.89674088331313]
We systematically evaluate a Deep Learning (DL) method in a 3D medical image segmentation task.
Our method is integrated into the radiosurgery treatment process and directly impacts the clinical workflow.
arXiv Detail & Related papers (2021-08-21T16:15:40Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.