Unleashing Diffusion and State Space Models for Medical Image Segmentation
- URL: http://arxiv.org/abs/2506.12747v2
- Date: Tue, 01 Jul 2025 07:16:34 GMT
- Title: Unleashing Diffusion and State Space Models for Medical Image Segmentation
- Authors: Rong Wu, Ziqi Chen, Liming Zhong, Heng Li, Hai Shu,
- Abstract summary: Existing segmentation models often lack robustness when encountering unseen organs or tumors.<n>We propose DSM, a framework that leverages diffusion and state space models to segment unseen tumor categories beyond the training data.<n>DSM learns organ queries using an object-aware feature grouping strategy to capture organ-level visual features.<n>It then refines tumor queries by focusing on diffusion-based visual prompts, enabling precise segmentation of previously unseen tumors.
- Score: 5.4377770015041795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing segmentation models trained on a single medical imaging dataset often lack robustness when encountering unseen organs or tumors. Developing a robust model capable of identifying rare or novel tumor categories not present during training is crucial for advancing medical imaging applications. We propose DSM, a novel framework that leverages diffusion and state space models to segment unseen tumor categories beyond the training data. DSM utilizes two sets of object queries trained within modified attention decoders to enhance classification accuracy. Initially, the model learns organ queries using an object-aware feature grouping strategy to capture organ-level visual features. It then refines tumor queries by focusing on diffusion-based visual prompts, enabling precise segmentation of previously unseen tumors. Furthermore, we incorporate diffusion-guided feature fusion to improve semantic segmentation performance. By integrating CLIP text embeddings, DSM captures category-sensitive classes to improve linguistic transfer knowledge, thereby enhancing the model's robustness across diverse scenarios and multi-label tasks. Extensive experiments demonstrate the superior performance of DSM in various tumor segmentation tasks. Code is available at https://github.com/Rows21/k-Means_Mask_Mamba.
Related papers
- Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation [56.52520416420957]
We propose Multimodal Causal-Driven Representation Learning (MCDRL) to tackle domain generalization in medical image segmentation.<n>MCDRL consistently outperforms competing methods, yielding superior segmentation accuracy and exhibiting robust generalizability.
arXiv Detail & Related papers (2025-08-07T03:41:41Z) - UniSegDiff: Boosting Unified Lesion Segmentation via a Staged Diffusion Model [53.34835793648352]
We propose UniSegDiff, a novel diffusion model framework for lesion segmentation.<n>UniSegDiff addresses lesion segmentation in a unified manner across multiple modalities and organs.<n> Comprehensive experimental results demonstrate that UniSegDiff significantly outperforms previous state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2025-07-24T12:33:10Z) - Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models [11.774375458215193]
Generalizable Tumor aims to train a single model for zero-shot tumor segmentation across diverse anatomical regions.<n>DiffuGTS creates anomaly-aware open-vocabulary attention maps based on text prompts.<n>Experiments on four datasets and seven tumor categories demonstrate the superior performance of our method.
arXiv Detail & Related papers (2025-05-05T16:05:37Z) - MRGen: Segmentation Data Engine For Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically significant imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize training data, to train segmentation models for underrepresented modalities.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation [2.2585213273821716]
We introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans.<n>Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss.<n>We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further.
arXiv Detail & Related papers (2024-09-28T23:10:37Z) - multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information [1.7359724605901228]
Brain Tumor distances (BraTS) plays a critical role in clinical diagnosis, treatment planning, and monitoring the progression of brain tumors.
Due to the variability in tumor appearance, size, and intensity across different MRI modalities, automated segmentation remains a challenging task.
We propose a novel Transformer-based framework, multiPI-TransBTS, which integrates multi-physical information to enhance segmentation accuracy.
arXiv Detail & Related papers (2024-09-18T17:35:19Z) - SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation [28.490211130600702]
We introduce SMAFormer, a Transformer-based architecture that fuses multiple attention mechanisms for enhanced segmentation of small tumors and organs.<n> SMAFormer can capture both local and global features for medical image segmentation.<n>We conduct extensive experiments on various medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, achieving state-of-the-art results.
arXiv Detail & Related papers (2024-08-31T04:23:33Z) - Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography [50.08496922659307]
We propose a universal framework enabling a single model, termed Universal Model, to deal with multiple public datasets and adapt to new classes.
Firstly, we introduce a novel language-driven parameter generator that leverages language embeddings from large language models.
Secondly, the conventional output layers are replaced with lightweight, class-specific heads, allowing Universal Model to simultaneously segment 25 organs and six types of tumors.
arXiv Detail & Related papers (2024-05-28T16:55:15Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - Optimized Global Perturbation Attacks For Brain Tumour ROI Extraction
From Binary Classification Models [0.304585143845864]
We propose a weakly supervised approach to obtain regions of interest using binary class labels.
We also propose a novel objective function to train the generator model based on a pretrained binary classification model.
arXiv Detail & Related papers (2022-11-09T14:52:36Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.