The impact of using voxel-level segmentation metrics on evaluating
multifocal prostate cancer localisation
- URL: http://arxiv.org/abs/2203.16415v2
- Date: Thu, 31 Mar 2022 02:19:37 GMT
- Title: The impact of using voxel-level segmentation metrics on evaluating
multifocal prostate cancer localisation
- Authors: Wen Yan and Qianye Yang and Tom Syer and Zhe Min and Shonit Punwani
and Mark Emberton and Dean C. Barratt and Bernard Chiu and Yipeng Hu
- Abstract summary: Dice similarity coefficient (DSC) and Hausdorff distance (HD) are widely used for evaluating medical image segmentation.
This work first proposes a new asymmetric detection metric, adapting those used in object detection, for planning prostate cancer procedures.
We report pairwise agreement and correlation 1) between DSC and HD, and 2) between voxel-level DSC and recall-controlled precision at lesion-level.
- Score: 8.035409264165937
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dice similarity coefficient (DSC) and Hausdorff distance (HD) are widely used
for evaluating medical image segmentation. They have also been criticised, when
reported alone, for their unclear or even misleading clinical interpretation.
DSCs may also differ substantially from HDs, due to boundary smoothness or
multiple regions of interest (ROIs) within a subject. More importantly, either
metric can also have a nonlinear, non-monotonic relationship with outcomes
based on Type 1 and 2 errors, designed for specific clinical decisions that use
the resulting segmentation. Whilst cases causing disagreement between these
metrics are not difficult to postulate. This work first proposes a new
asymmetric detection metric, adapting those used in object detection, for
planning prostate cancer procedures. The lesion-level metrics is then compared
with the voxel-level DSC and HD, whereas a 3D UNet is used for segmenting
lesions from multiparametric MR (mpMR) images. Based on experimental results we
report pairwise agreement and correlation 1) between DSC and HD, and 2) between
voxel-level DSC and recall-controlled precision at lesion-level, with Cohen's
[0.49, 0.61] and Pearson's [0.66, 0.76] (p-values}<0.001) at varying cut-offs.
However, the differences in false-positives and false-negatives, between the
actual errors and the perceived counterparts if DSC is used, can be as high as
152 and 154, respectively, out of the 357 test set lesions. We therefore
carefully conclude that, despite of the significant correlations, voxel-level
metrics such as DSC can misrepresent lesion-level detection accuracy for
evaluating localisation of multifocal prostate cancer and should be interpreted
with caution.
Related papers
- Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks [60.80828925396154]
We present Connected-Component(CC)-Metrics, a novel semantic segmentation evaluation protocol.
We motivate this setup in the common medical scenario of semantic segmentation in a full-body PET/CT.
We show how existing semantic segmentation metrics suffer from a bias towards larger connected components.
arXiv Detail & Related papers (2024-10-24T12:26:05Z) - Accurate Fine-Grained Segmentation of Human Anatomy in Radiographs via
Volumetric Pseudo-Labeling [66.75096111651062]
We created a large-scale dataset of 10,021 thoracic CTs with 157 labels.
We applied an ensemble of 3D anatomy segmentation models to extract anatomical pseudo-labels.
Our resulting segmentation models demonstrated remarkable performance on CXR.
arXiv Detail & Related papers (2023-06-06T18:01:08Z) - Tackling Bias in the Dice Similarity Coefficient: Introducing nDSC for
White Matter Lesion Segmentation [10.182222073140991]
The Dice Similarity Coefficient (DSC) is a popular choice for comparing the agreement between the predicted segmentation against a ground-truth mask.
The DSC metric has been shown to be biased to the occurrence rate of the positive class in the ground-truth.
This work describes a detailed analysis of the recently proposed normalised DSC for binary segmentation tasks.
arXiv Detail & Related papers (2023-02-10T18:48:13Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Novel structural-scale uncertainty measures and error retention curves:
application to multiple sclerosis [9.295643448425182]
This paper focuses on the uncertainty estimation for white matter lesions (WML) segmentation in magnetic resonance imaging (MRI)
On one side, voxel-scale segmentation errors cause the erroneous delineation of the lesions; on the other side, lesion-scale detection errors lead to wrong lesion counts.
This work aims to compare the ability of different voxel- and lesion-scale uncertainty measures to capture errors related to segmentation and lesion detection, respectively.
arXiv Detail & Related papers (2022-11-09T11:53:29Z) - Adaptive Contrastive Learning with Dynamic Correlation for Multi-Phase
Organ Segmentation [25.171694372205774]
We propose a novel data-driven contrastive loss function that adapts the similar/dissimilar contrast relationship between samples in each minibatch at organ-level.
We evaluate our proposed approach on multi-organ segmentation with both non-contrast CT datasets and the MICCAI 2015 BTCV Challenge contrast-enhance CT datasets.
arXiv Detail & Related papers (2022-10-16T22:38:30Z) - Corneal endothelium assessment in specular microscopy images with Fuchs'
dystrophy via deep regression of signed distance maps [48.498376125522114]
This paper proposes a UNet-based segmentation approach that requires minimal post-processing.
It achieves reliable CE morphometric assessment and guttae identification across all degrees of Fuchs' dystrophy.
arXiv Detail & Related papers (2022-10-13T15:34:20Z) - Comparison of Evaluation Metrics for Landmark Detection in CMR Images [0.8219153654616499]
We extend the public ACDC dataset with additional labels of the right ventricular insertion points.
We compare different variants of a heatmap-based landmark detection pipeline.
Preliminary results indicate that a combination of different metrics is necessary.
arXiv Detail & Related papers (2022-01-25T15:58:30Z) - Multiple Sclerosis Lesions Identification/Segmentation in Magnetic
Resonance Imaging using Ensemble CNN and Uncertainty Classification [7.260554897161948]
We present an automated framework for MS lesions identification/segmentation based on three pivotal concepts.
The proposed framework is trained, validated and tested on the 2016 MSSEG benchmark public data set.
Results are also shown for the uncertainty, though a comparison with the other raters is impossible.
arXiv Detail & Related papers (2021-08-26T13:48:06Z) - Controlling False Positive/Negative Rates for Deep-Learning-Based
Prostate Cancer Detection on Multiparametric MR images [58.85481248101611]
We propose a novel PCa detection network that incorporates a lesion-level cost-sensitive loss and an additional slice-level loss based on a lesion-to-slice mapping function.
Our experiments based on 290 clinical patients concludes that 1) The lesion-level FNR was effectively reduced from 0.19 to 0.10 and the lesion-level FPR was reduced from 1.03 to 0.66 by changing the lesion-level cost.
arXiv Detail & Related papers (2021-06-04T09:51:27Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.