Towards a Guideline for Evaluation Metrics in Medical Image Segmentation
- URL: http://arxiv.org/abs/2202.05273v1
- Date: Thu, 10 Feb 2022 13:38:05 GMT
- Title: Towards a Guideline for Evaluation Metrics in Medical Image Segmentation
- Authors: Dominik M\"uller, I\~naki Soto-Rey and Frank Kramer
- Abstract summary: This work provides an overview and interpretation guide on the following metrics for medical image segmentation evaluation in binary.
As a summary, we propose a guideline for standardized medical image segmentation evaluation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last decade, research on artificial intelligence has seen rapid growth
with deep learning models, especially in the field of medical image
segmentation. Various studies demonstrated that these models have powerful
prediction capabilities and achieved similar results as clinicians. However,
recent studies revealed that the evaluation in image segmentation studies lacks
reliable model performance assessment and showed statistical bias by incorrect
metric implementation or usage. Thus, this work provides an overview and
interpretation guide on the following metrics for medical image segmentation
evaluation in binary as well as multi-class problems: Dice similarity
coefficient, Jaccard, Sensitivity, Specificity, Rand index, ROC curves, Cohen's
Kappa, and Hausdorff distance. As a summary, we propose a guideline for
standardized medical image segmentation evaluation to improve evaluation
quality, reproducibility, and comparability in the research field.
Related papers
- EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement [14.724629346280402]
generative models have achieved significant success in enhancement fundus images.
The evaluation of these models still presents a considerable challenge.
We propose a novel comprehensive benchmark, EyeBench, to provide insights that align enhancement models with clinical needs.
arXiv Detail & Related papers (2025-02-20T04:56:03Z) - The Skin Game: Revolutionizing Standards for AI Dermatology Model Comparison [0.6144680854063939]
Deep Learning approaches in dermatological image classification have shown promising results, yet the field faces significant methodological challenges that impede proper evaluation.
This paper presents a systematic analysis of current methodological practices in skin disease classification research, revealing substantial inconsistencies in data preparation, augmentation strategies, and performance reporting.
We propose comprehensive methodological recommendations for model development, evaluation, and clinical deployment, emphasizing rigorous data preparation, systematic error analysis, and specialized protocols for different image types.
arXiv Detail & Related papers (2025-02-04T17:15:36Z) - Pitfalls of topology-aware image segmentation [81.19923502845441]
We identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts, and inappropriate use of evaluation metrics.
We propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
arXiv Detail & Related papers (2024-12-19T08:11:42Z) - Content-Based Image Retrieval for Multi-Class Volumetric Radiology Images: A Benchmark Study [0.6249768559720122]
We benchmark embeddings derived from pre-trained supervised models on medical images against embeddings derived from pre-trained unsupervised models on non-medical images.
For volumetric image retrieval, we adopt a late interaction re-ranking method inspired by text matching.
arXiv Detail & Related papers (2024-05-15T13:34:07Z) - Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report
Generation [47.250147322130545]
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images.
Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists.
We present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.
arXiv Detail & Related papers (2023-11-18T14:37:53Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Evaluation of importance estimators in deep learning classifiers for
Computed Tomography [1.6710577107094642]
Interpretability of deep neural networks often relies on estimating the importance of input features.
Two versions of SmoothGrad topped the fidelity and ROC rankings, whereas both Integrated Gradients and SmoothGrad excelled in DSC evaluation.
There was a critical discrepancy between model-centric (fidelity) and human-centric (ROC and DSC) evaluation.
arXiv Detail & Related papers (2022-09-30T11:57:25Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z) - Weakly supervised multiple instance learning histopathological tumor
segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation.
We exploit a multiple instance learning scheme for training models.
The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.