Related papers: In search of truth: Evaluating concordance of AI-based anatomy segmentation models

In search of truth: Evaluating concordance of AI-based anatomy segmentation models

URL: http://arxiv.org/abs/2512.15921v1
Date: Wed, 17 Dec 2025 19:33:56 GMT
Title: In search of truth: Evaluating concordance of AI-based anatomy segmentation models
Authors: Lena Giebeler, Deepa Krishnaswamy, David Clunie, Jakob Wasserthal, Lalith Kumar Shiyam Sundar, Andres Diaz-Pinto, Klaus H. Maier-Hein, Murong Xu, Bjoern Menze, Steve Pieper, Ron Kikinis, Andrey Fedorov,
Abstract summary: AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets.<n>We introduce a practical framework to assist in evaluating them on datasets that do not contain ground truth annotations.
Score: 3.740726797046942
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Purpose AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets. The growing number of similar in functionality models raises the challenge of evaluating them on datasets that do not contain ground truth annotations. We introduce a practical framework to assist in this task. Approach We harmonize the segmentation results into a standard, interoperable representation, which enables consistent, terminology-based labeling of the structures. We extend 3D Slicer to streamline loading and comparison of these harmonized segmentations, and demonstrate how standard representation simplifies review of the results using interactive summary plots and browser-based visualization using OHIF Viewer. To demonstrate the utility of the approach we apply it to evaluating segmentation of 31 anatomical structures (lungs, vertebrae, ribs, and heart) by six open-source models - TotalSegmentator 1.5 and 2.6, Auto3DSeg, MOOSE, MultiTalent, and CADS - for a sample of Computed Tomography (CT) scans from the publicly available National Lung Screening Trial (NLST) dataset. Results We demonstrate the utility of the framework in enabling automating loading, structure-wise inspection and comparison across models. Preliminary results ascertain practical utility of the approach in allowing quick detection and review of problematic results. The comparison shows excellent agreement segmenting some (e.g., lung) but not all structures (e.g., some models produce invalid vertebrae or rib segmentations). Conclusions The resources developed are linked from https://imagingdatacommons.github.io/segmentation-comparison/ including segmentation harmonization scripts, summary plots, and visualization tools. This work assists in model evaluation in absence of ground truth, ultimately enabling informed model selection.

Related papers

J-RAS: Enhancing Medical Image Segmentation via Retrieval-Augmented Joint Training [0.0]
We propose a joint training method for guided image segmentation that integrates a segmentation model with a retrieval model.<n>Both models are optimized, enabling the segmentation model to leverage retrieved image-mask pairs to enrich anatomical understanding.<n>We validate J-RAS across multiple segmentation backbones, including U-Net, TransUNet, SAM, and SegFormer, on two benchmark datasets.
arXiv Detail & Related papers (2025-10-11T01:53:28Z)
Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images [22.36128130052757]
We build a ground-truth-free evaluation model to assess the quality of segmentations generated by the Segment Anything Model (SAM) and its variants in medical imaging. This evaluation model estimates segmentation quality scores by analyzing the coherence and consistency between the input images and their corresponding segmentation predictions.
arXiv Detail & Related papers (2024-09-23T10:12:08Z)
Aggregated Attributions for Explanatory Analysis of 3D Segmentation Models [7.416913210816592]
We introduce Agg2Exp, a methodology for aggregating fine-grained voxel attributions of 3D segmentation models.<n>Our experiments show that gradient-based voxel attributions are more faithful to the model's predictions than perturbation-based explanations.<n>Agg2Exp facilitates the explanatory analysis of large segmentation models beyond their predictive performance.
arXiv Detail & Related papers (2024-07-23T17:14:01Z)
TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI [59.86827659781022]
A nnU-Net model (TotalSegmentator) was trained on MRI and segment 80atomic structures.<n>Dice scores were calculated between the predicted segmentations and expert reference standard segmentations to evaluate model performance.<n>Open-source, easy-to-use model allows for automatic, robust segmentation of 80 structures.
arXiv Detail & Related papers (2024-05-29T20:15:54Z)
Diffusion-based Data Augmentation for Nuclei Image Segmentation [68.28350341833526]
We introduce the first diffusion-based augmentation method for nuclei segmentation. The idea is to synthesize a large number of labeled images to facilitate training the segmentation model. The experimental results show that by augmenting 10% labeled real dataset with synthetic samples, one can achieve comparable segmentation results.
arXiv Detail & Related papers (2023-10-22T06:16:16Z)
Topological Data Analysis Guided Segment Anything Model Prompt Optimization for Zero-Shot Segmentation in Biological Imaging [5.795215830149858]
We propose topological data analysis guided prompt optimization for the Segment Anything Model (SAM) Our results show that the TDA optimized point cloud is much better suited for finding small objects and massively reduces computational complexity.
arXiv Detail & Related papers (2023-06-30T05:00:38Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification [62.403510793388705]
Deep learning has produced encouraging results for kidney stone classification using endoscope images. The shortage of annotated training data poses a severe problem in improving the performance and generalization ability of the trained model. We propose SegPrompt to alleviate the data shortage problems by exploiting segmentation maps from two aspects.
arXiv Detail & Related papers (2023-03-15T01:30:48Z)
Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes. Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Unsupervised Learning Consensus Model for Dynamic Texture Videos Segmentation [12.462608802359936]
We present an effective unsupervised learning consensus model for the segmentation of dynamic texture (ULCM) In the proposed model, the set of values of the requantized local binary patterns (LBP) histogram around the pixel to be classified are used as features. Experiments conducted on the challenging SynthDB dataset show that ULCM is significantly faster, easier to code, simple and has limited parameters.
arXiv Detail & Related papers (2020-06-29T16:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.