Understanding implementation pitfalls of distance-based metrics for image segmentation
- URL: http://arxiv.org/abs/2410.02630v2
- Date: Thu, 31 Jul 2025 12:31:02 GMT
- Title: Understanding implementation pitfalls of distance-based metrics for image segmentation
- Authors: Gasper Podobnik, Tomaz Vrtovec,
- Abstract summary: Distance-based metrics are widely used to validate segmentation performance in (bio)medical imaging.<n>Their implementation is complex, and critical differences across open-source tools remain largely unrecognized by the community.<n>This study systematically dissects 11 open-source tools that implement distance-based metric computation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distance-based metrics, such as the Hausdorff distance (HD), are widely used to validate segmentation performance in (bio)medical imaging. However, their implementation is complex, and critical differences across open-source tools remain largely unrecognized by the community. These discrepancies undermine benchmarking efforts, introduce bias in biomarker calculations, and potentially distort medical device development and clinical commissioning. In this study, we systematically dissect 11 open-source tools that implement distance-based metric computation by performing both a conceptual analysis of their computational steps and an empirical analysis on representative two- and three-dimensional image datasets. Alarmingly, we observed deviations in HD exceeding 100 mm and identified multiple statistically significant differences between tools - demonstrating that statistically significant improvements on the same set of segmentations can be achieved simply by selecting a particular implementation. These findings cast doubts on the validity of prior comparisons of results across studies without accounting for the differences in metric implementations. To address this, we provide practical recommendations for tool selection; additionally, our conceptual analysis informs about the future evolution of implementing open-source tools.
Related papers
- Uncertainty evaluation of segmentation models for Earth observation [4.350621291554061]
This paper investigates methods for estimating uncertainty in semantic segmentation predictions derived from satellite imagery.<n>Our evaluation focuses on the practical utility of uncertainty measures, testing their ability to identify prediction errors and noise-corrupted input image regions.
arXiv Detail & Related papers (2025-10-22T13:39:28Z) - Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach [118.75896764188424]
We present a novel perspective to expose the inherent size sensitivity of existing widely used Salient Object Detection metrics.<n>To address this challenge, a generic Size-Invariant Evaluation (SIEva) framework is proposed.<n>We further develop a dedicated optimization framework (SIOpt), which adheres to the size-invariant principle and significantly enhances the detection of salient objects across a broad range of sizes.
arXiv Detail & Related papers (2025-09-19T04:12:14Z) - MeshMetrics: A Precise Implementation of Distance-Based Image Segmentation Metrics [0.0]
MeshMetrics is a mesh-based framework that provides a more precise computation of distance-based metrics than conventional grid-based approaches.<n>We demonstrate that MeshMetrics achieves higher accuracy and precision than established tools, and is substantially less affected by discretization artifacts.
arXiv Detail & Related papers (2025-09-06T10:16:40Z) - Metrics for Inter-Dataset Similarity with Example Applications in Synthetic Data and Feature Selection Evaluation -- Extended Version [1.6863735232819916]
Existing methods for measuring inter-dataset similarity are computationally expensive, limited, or sensitive to different entities.
We propose two novel metrics for measuring inter-dataset similarity.
arXiv Detail & Related papers (2025-01-16T15:17:27Z) - Evaluating Representational Similarity Measures from the Lens of Functional Correspondence [1.7811840395202345]
Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data.
Despite the widespread use of representational comparisons, a critical question remains: which metrics are most suitable for these comparisons?
arXiv Detail & Related papers (2024-11-21T23:53:58Z) - Every Component Counts: Rethinking the Measure of Success for Medical Semantic Segmentation in Multi-Instance Segmentation Tasks [60.80828925396154]
We present Connected-Component(CC)-Metrics, a novel semantic segmentation evaluation protocol.
We motivate this setup in the common medical scenario of semantic segmentation in a full-body PET/CT.
We show how existing semantic segmentation metrics suffer from a bias towards larger connected components.
arXiv Detail & Related papers (2024-10-24T12:26:05Z) - Segmentation Quality and Volumetric Accuracy in Medical Imaging [0.9426448361599084]
Current medical image segmentation relies on the region-based (Dice, F1-score) and boundary-based (Hausdorff distance, surface distance) metrics as the de-facto standard.
While these metrics are widely used, they lack a unified interpretation, particularly regarding volume agreement.
We utilize relative volume prediction error (vpe) to directly assess the accuracy of volume predictions derived from segmentation tasks.
arXiv Detail & Related papers (2024-04-27T00:49:39Z) - Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation Models [6.8726432208129555]
We systematically analyze the impact of class characteristics, training and test data composition, and modeling parameters on eight segmentation metrics.
Our findings support two adjustments to account for data biases in surgical data science.
arXiv Detail & Related papers (2024-03-25T21:08:26Z) - PULASki: Learning inter-rater variability using statistical distances to improve probabilistic segmentation [35.34932609930401]
This work proposes the PULASki method as a computationally efficient generative tool for biomedical image segmentation.<n>It captures variability in expert annotations, even in small datasets.<n>Our experiments are also the first to present a comparative study of the computationally feasible segmentation of complex geometries using 3D patches and the traditional use of 2D slices.
arXiv Detail & Related papers (2023-12-25T10:31:22Z) - Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion [78.76867266561537]
The evaluation process still heavily relies on closed-set metrics without considering the similarity between predicted and ground truth categories.
To tackle this issue, we first survey eleven similarity measurements between two categorical words.
We designed novel evaluation metrics, namely Open mIoU, Open AP, and Open PQ, tailored for three open-vocabulary segmentation tasks.
arXiv Detail & Related papers (2023-11-06T18:59:01Z) - Deployment of Image Analysis Algorithms under Prevalence Shifts [6.373765910269204]
Domain gaps are among the most relevant roadblocks in the clinical translation of machine learning (ML)-based solutions for medical image analysis.
We propose a workflow for prevalence-aware image classification that uses estimated deployment prevalences to adjust a trained classifier to a new environment.
arXiv Detail & Related papers (2023-03-22T13:16:37Z) - Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis.
Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Metrics reloaded: Recommendations for image analysis validation [59.60445111432934]
Metrics Reloaded is a comprehensive framework guiding researchers in the problem-aware selection of metrics.
The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint.
Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics.
arXiv Detail & Related papers (2022-06-03T15:56:51Z) - The Exploitation of Distance Distributions for Clustering [3.42658286826597]
In cluster analysis, different properties for distance distributions are judged to be relevant for appropriate distance selection.
By systematically investigating this specification using distribution analysis through a mirrored-density plot, it is shown that multimodal distance distributions are preferable in cluster analysis.
Experiments are performed on several artificial datasets and natural datasets for the task of clustering.
arXiv Detail & Related papers (2021-08-22T06:22:08Z) - Common Limitations of Image Processing Metrics: A Picture Story [58.83274952067888]
This document focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task.
The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.
arXiv Detail & Related papers (2021-04-12T17:03:42Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Estimating informativeness of samples with Smooth Unique Information [108.25192785062367]
We measure how much a sample informs the final weights and how much it informs the function computed by the weights.
We give efficient approximations of these quantities using a linearized network.
We apply these measures to several problems, such as dataset summarization.
arXiv Detail & Related papers (2021-01-17T10:29:29Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel
Data [4.550919471480445]
We develop a data-driven smoothing technique for high-dimensional and non-linear panel data models.
The weights are determined by a data-driven way and depend on the similarity between the corresponding functions.
We conduct a simulation study which shows that the prediction can be greatly improved by using our estimator.
arXiv Detail & Related papers (2019-12-30T09:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.