VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis
- URL: http://arxiv.org/abs/2405.05667v1
- Date: Thu, 9 May 2024 10:41:18 GMT
- Title: VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis
- Authors: Zhihan Ju, Wanting Zhou,
- Abstract summary: We propose the Vision Mamba DDPM (VM-DDPM) based on State Space Model (SSM)
To our best knowledge, this is the first medical image synthesis model based on the SSM-CNN hybrid architecture.
Our experimental evaluation on three datasets of different scales, i.e., ACDC, BraTS2018, and ChestXRay, demonstrate that VM-DDPM achieves state-of-the-art performance.
- Score: 0.8111815974227898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of smart healthcare, researchers enhance the scale and diversity of medical datasets through medical image synthesis. However, existing methods are limited by CNN local perception and Transformer quadratic complexity, making it difficult to balance structural texture consistency. To this end, we propose the Vision Mamba DDPM (VM-DDPM) based on State Space Model (SSM), fully combining CNN local perception and SSM global modeling capabilities, while maintaining linear computational complexity. Specifically, we designed a multi-level feature extraction module called Multi-level State Space Block (MSSBlock), and a basic unit of encoder-decoder structure called State Space Layer (SSLayer) for medical pathological images. Besides, we designed a simple, Plug-and-Play, zero-parameter Sequence Regeneration strategy for the Cross-Scan Module (CSM), which enabled the S6 module to fully perceive the spatial features of the 2D image and stimulate the generalization potential of the model. To our best knowledge, this is the first medical image synthesis model based on the SSM-CNN hybrid architecture. Our experimental evaluation on three datasets of different scales, i.e., ACDC, BraTS2018, and ChestXRay, as well as qualitative evaluation by radiologists, demonstrate that VM-DDPM achieves state-of-the-art performance.
Related papers
- Unifying Subsampling Pattern Variations for Compressed Sensing MRI with Neural Operators [72.79532467687427]
Compressed Sensing MRI reconstructs images of the body's internal anatomy from undersampled and compressed measurements.
Deep neural networks have shown great potential for reconstructing high-quality images from highly undersampled measurements.
We propose a unified model that is robust to different subsampling patterns and image resolutions in CS-MRI.
arXiv Detail & Related papers (2024-10-05T20:03:57Z) - HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation [1.5574423250822542]
We propose a U-shape architecture model for medical image segmentation, named Hybird Transformer vision Mamba UNet (HTM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-Larib PolypDB public datasets and ZD-LCI-GIM private dataset.
arXiv Detail & Related papers (2024-08-21T02:25:14Z) - I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling [8.909355958696414]
We propose a novel adversarial model for medical image synthesis, I2I-Mamba, to efficiently capture long-range context.
I2I-Mamba offers superior performance against state-of-the-art CNN- and transformer-based methods in synthesizing target-modality images.
arXiv Detail & Related papers (2024-05-22T21:55:58Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark
Detection with State Space Model [24.955052600683423]
In this paper, we introduce nnMamba, a novel architecture that integrates the strengths of CNNs and the advanced long-range modeling capabilities of State Space Sequence Models (SSMs)
Experiments on 6 datasets demonstrate nnMamba's superiority over state-of-the-art methods in a suite of challenging tasks, including 3D image segmentation, classification, and landmark detection.
arXiv Detail & Related papers (2024-02-05T21:28:47Z) - VM-UNet: Vision Mamba UNet for Medical Image Segmentation [3.170171905334503]
We propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.
arXiv Detail & Related papers (2024-02-04T13:37:21Z) - BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP
for Generic Natural Visual Stimulus Decoding [51.911473457195555]
BrainCLIP is a task-agnostic fMRI-based brain decoding model.
It bridges the modality gap between brain activity, image, and text.
BrainCLIP can reconstruct visual stimuli with high semantic fidelity.
arXiv Detail & Related papers (2023-02-25T03:28:54Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Fed-Sim: Federated Simulation for Medical Imaging [131.56325440976207]
We introduce a physics-driven generative approach that consists of two learnable neural modules.
We show that our data synthesis framework improves the downstream segmentation performance on several datasets.
arXiv Detail & Related papers (2020-09-01T19:17:46Z) - Hierarchical Amortized Training for Memory-efficient High Resolution 3D
GAN [52.851990439671475]
We propose a novel end-to-end GAN architecture that can generate high-resolution 3D images.
We achieve this goal by using different configurations between training and inference.
Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation.
arXiv Detail & Related papers (2020-08-05T02:33:04Z) - Neural Architecture Search for Gliomas Segmentation on Multimodal
Magnetic Resonance Imaging [2.66512000865131]
We propose a neural architecture search (NAS) based solution to brain tumor segmentation tasks on multimodal MRI scans.
The developed solution also integrates normalization and patching strategies tailored for brain MRI processing.
arXiv Detail & Related papers (2020-05-13T14:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.