A methodology for clinically driven interactive segmentation evaluation
- URL: http://arxiv.org/abs/2510.09499v1
- Date: Fri, 10 Oct 2025 16:00:06 GMT
- Title: A methodology for clinically driven interactive segmentation evaluation
- Authors: Parhom Esmaeili, Virginia Fernandez, Pedro Borges, Eli Gibson, Sebastien Ourselin, M. Jorge Cardoso,
- Abstract summary: Interactive segmentation is a promising strategy for building robust, generalisable algorithms for medical image segmentation.<n>However, inconsistent and clinically unrealistic evaluation hinders fair comparison and misrepresents real-world performance.<n>We propose a clinically grounded methodology for defining evaluation tasks and metrics, and built a software framework for constructing standardised evaluation pipelines.
- Score: 1.1425176528359444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interactive segmentation is a promising strategy for building robust, generalisable algorithms for volumetric medical image segmentation. However, inconsistent and clinically unrealistic evaluation hinders fair comparison and misrepresents real-world performance. We propose a clinically grounded methodology for defining evaluation tasks and metrics, and built a software framework for constructing standardised evaluation pipelines. We evaluate state-of-the-art algorithms across heterogeneous and complex tasks and observe that (i) minimising information loss when processing user interactions is critical for model robustness, (ii) adaptive-zooming mechanisms boost robustness and speed convergence, (iii) performance drops if validation prompting behaviour/budgets differ from training, (iv) 2D methods perform well with slab-like images and coarse targets, but 3D context helps with large or irregularly shaped targets, (v) performance of non-medical-domain models (e.g. SAM2) degrades with poor contrast and complex shapes.
Related papers
- From Perception to Action: An Interactive Benchmark for Vision Reasoning [51.11355591375073]
Causal Hierarchy of Actions and Interactions (CHAIN) benchmark designed to evaluate whether models can understand, plan, and execute structured action sequences grounded in physical constraints.<n> CHAIN shifts evaluation from passive perception to active problem solving, spanning tasks such as interlocking mechanical puzzles and 3D stacking and packing.<n>Our results show that top-performing models still struggle to internalize physical structure and causal constraints, often failing to produce reliable long-horizon plans and cannot robustly translate perceived structure into effective actions.
arXiv Detail & Related papers (2026-02-24T15:33:02Z) - Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study [6.364545942101905]
Feature disentanglement is a promising approach to mitigate shortcut learning.<n>Shortcut mitigation methods improved classification performance under strong spurious correlations.<n>The best-performing models combine data-centric rebalancing with model-centric disentanglement.
arXiv Detail & Related papers (2026-02-17T10:51:58Z) - Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations [4.581671524490035]
We propose an end-to-end Staged Voxel-Level Deep Reinforcement Learning framework for robust medical image segmentation under noisy annotations.<n>This framework employs a dynamic iterative update strategy to automatically mitigate the impact of erroneous labels without requiring manual intervention.
arXiv Detail & Related papers (2026-01-07T12:39:54Z) - Explainable Human-in-the-Loop Segmentation via Critic Feedback Signals [0.20999222360659608]
We propose a human-in-the-loop interactive framework that enables interventional learning through targeted human corrections of segmentation outputs.<n>We demonstrate that our framework improves segmentation accuracy by up to 9 mIoU points on challenging cubemap data.<n>This work provides a practical framework for researchers and practitioners seeking to build segmentation systems that are accurate, robust to dataset biases, data-efficient, and adaptable to real-world domains such as urban climate monitoring and autonomous driving.
arXiv Detail & Related papers (2025-10-11T01:16:41Z) - Explicit modelling of subject dependency in BCI decoding [12.17288254938554]
Brain-Computer Interfaces (BCIs) suffer from high inter-subject variability and limited labeled data.<n>We present an end-to-end approach that explicitly models the subject dependency using lightweight convolutional neural networks (CNNs) conditioned on the subject's identity.
arXiv Detail & Related papers (2025-09-27T10:51:42Z) - Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Evaluation of Seismic Artificial Intelligence with Uncertainty [0.0]
We develop an evaluation framework for evaluating and comparing deep learning models (DLMs)<n>Our framework helps practitioners choose the best model for their problem and set performance expectations.
arXiv Detail & Related papers (2025-01-15T16:45:51Z) - Pitfalls of topology-aware image segmentation [81.19923502845441]
We identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts, and inappropriate use of evaluation metrics.<n>We propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
arXiv Detail & Related papers (2024-12-19T08:11:42Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [43.42688356541211]
Foundation models excel at single-turn reasoning but struggle with multi-turn exploration in dynamic environments.<n>We evaluated these models on their ability to learn from experience, adapt, and gather information.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - RadioActive: 3D Radiological Interactive Segmentation Benchmark [1.1095764130645482]
Recent interactive segmentation models, inspired by METAs Segment Anything, have made significant progress but face critical limitations in 3D.<n>The RadioActive benchmark addresses these challenges by providing a rigorous and reproducible evaluation framework.<n>Surprisingly, SAM2 outperforms all specialized medical 2D and 3D models in a setting requiring only a few interactions to generate prompts for a 3D volume.
arXiv Detail & Related papers (2024-11-12T15:47:17Z) - Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms [14.82820088479196]
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks.
In this work, we contribute tools to perform such a comprehensive evaluation.
We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time.
arXiv Detail & Related papers (2024-05-27T14:03:28Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Scalable Intervention Target Estimation in Linear Models [52.60799340056917]
Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets.
This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets.
The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class.
arXiv Detail & Related papers (2021-11-15T03:16:56Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.