Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation
- URL: http://arxiv.org/abs/2602.07643v1
- Date: Sat, 07 Feb 2026 17:54:10 GMT
- Title: Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation
- Authors: Yichi Zhang, Feiyang Xiao, Le Xue, Wenbo Zhang, Gang Feng, Chenguang Zheng, Yuan Qi, Yuan Cheng, Zixin Hu,
- Abstract summary: 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities.<n>This dataset comprises 490 whole-body PET/CT and 464 whole-body PET/MRI scans.
- Score: 26.70378621396797
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While emerging 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities, their validation remains largely confined to regional and structural imaging, leaving a significant modality discrepancy unexplored. To provide a rigorous and objective assessment, we curate the UMD dataset comprising 490 whole-body PET/CT and 464 whole-body PET/MRI scans ($\sim$675k 2D images, $\sim$12k 3D organ annotations) and conduct a thorough and comprehensive evaluation of representative 3D segmentation foundation models. Through intra-subject controlled comparisons of paired scans, we isolate imaging modality as the primary independent variable to evaluate model robustness in real-world applications. Our evaluation reveals a stark discrepancy between literature-reported benchmarks and real-world efficacy, particularly when transitioning from structural to functional domains. Such systemic failures underscore that current 3D foundation models are far from achieving truly general-purpose status, necessitating a paradigm shift toward multi-modal training and evaluation to bridge the gap between idealized benchmarking and comprehensive clinical utility. This dataset and analysis establish a foundational cornerstone for future research to develop truly modality-agnostic medical foundation models.
Related papers
- Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis [20.302134776419955]
We propose a reinforcement learning framework with a two-stage training process: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL)<n>In RL stage, we redesign the consistency reward to explicitly promote coherent, step-by-step diagnostic reasoning.<n>Our model attains state-of-the-art accuracies of 41.92% on CT-RATE and 44.99% on RAD-ChestCT.
arXiv Detail & Related papers (2026-02-01T12:43:11Z) - Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach [57.86418347491272]
We propose a comprehensive hierarchical classification system, with 404 representative abnormal findings across all body regions.<n>We contribute a dataset containing over 14.5K CT images from multiple planes and all human body regions, and meticulously provide grounding annotations for over 19K abnormalities.<n>We propose OminiAbnorm-CT, which can automatically ground and describe abnormal findings on multi-plane and whole-body CT images based on text queries.
arXiv Detail & Related papers (2025-06-03T17:57:34Z) - AI-Assisted Colonoscopy: Polyp Detection and Segmentation using Foundation Models [0.10037949839020764]
In colonoscopy, 80% of the missed polyps could be detected with the help of Deep Learning models.<n>In the search for algorithms capable of addressing this challenge, foundation models emerge as promising candidates.<n>Their zero-shot or few-shot learning capabilities, facilitate generalization to new data or tasks without extensive fine-tuning.<n>A comprehensive evaluation of foundation models for polyp segmentation was conducted, assessing both detection and delimitation.
arXiv Detail & Related papers (2025-03-31T14:20:53Z) - 3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography [5.65192078662102]
We introduce FM-CT: a Foundation Model for Head CT for generalizable disease detection, trained using self-supervised learning.<n>Our approach pre-trains a deep learning model on a large, diverse dataset of 361,663 non-contrast 3D head CT scans without the need for manual annotations.<n>Our results demonstrate that the self-supervised foundation model significantly improves performance on downstream diagnostic tasks.
arXiv Detail & Related papers (2025-02-04T23:42:18Z) - SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation [36.95030121663565]
Supervised Finetuning (SFT) serves as an effective way to adapt foundation models for task-specific downstream tasks.
We propose SAM-Med3D-MoE, a novel framework that seamlessly integrates task-specific finetuned models with the foundational model.
Our experiments demonstrate the efficacy of SAM-Med3D-MoE, with an average Dice performance increase from 53 to 56.4 on 15 specific classes.
arXiv Detail & Related papers (2024-07-06T03:03:45Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion [32.71972792352939]
3D patient body modeling is critical to the success of automated patient positioning for smart medical scanning and operating rooms.
Existing CNN-based end-to-end patient modeling solutions typically require customized network designs demanding large amount of relevant training data.
We propose a generic modularized 3D patient modeling method consists of (a) a multi-modal keypoint detection module with attentive fusion for 2D patient joint localization.
We demonstrate the efficacy of the proposed method by extensive patient positioning experiments on both public and clinical data.
arXiv Detail & Related papers (2024-03-05T18:58:55Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - Towards Generalist Foundation Model for Radiology by Leveraging
Web-scale 2D&3D Medical Data [66.9359934608229]
This study aims to initiate the development of Radiology Foundation Model, termed as RadFM.
To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans.
We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
arXiv Detail & Related papers (2023-08-04T17:00:38Z) - Deep Implicit Statistical Shape Models for 3D Medical Image Delineation [47.78425002879612]
3D delineation of anatomical structures is a cardinal goal in medical imaging analysis.
Prior to deep learning, statistical shape models that imposed anatomical constraints and produced high quality surfaces were a core technology.
We present deep implicit statistical shape models (DISSMs), a new approach to delineation that marries the representation power of CNNs with the robustness of SSMs.
arXiv Detail & Related papers (2021-04-07T01:15:06Z) - Fader Networks for domain adaptation on fMRI: ABIDE-II study [68.5481471934606]
We use 3D convolutional autoencoders to build the domain irrelevant latent space image representation and demonstrate this method to outperform existing approaches on ABIDE data.
arXiv Detail & Related papers (2020-10-14T16:50:50Z) - Shape-aware Meta-learning for Generalizing Prostate MRI Segmentation to
Unseen Domains [68.73614619875814]
We present a novel shape-aware meta-learning scheme to improve the model generalization in prostate MRI segmentation.
Experimental results show that our approach outperforms many state-of-the-art generalization methods consistently across all six settings of unseen domains.
arXiv Detail & Related papers (2020-07-04T07:56:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.