VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
- URL: http://arxiv.org/abs/2403.09157v1
- Date: Thu, 14 Mar 2024 08:12:39 GMT
- Title: VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation
- Authors: Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao,
- Abstract summary: We propose Vison Mamba-UNetV2, inspired by Mamba architecture, to capture contextual information in images.
VM-UNetV2 exhibits competitive performance in medical image segmentation tasks.
We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir CVC-ColonDB and ETIS-LaribPolypDB public datasets.
- Score: 8.278068663433261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models (SSMs), such as Mamba, have been recognized as a promising method. They not only demonstrate superior performance in modeling long-range interactions, but also preserve a linear computational complexity. Inspired by the Mamba architecture, We proposed Vison Mamba-UNetV2, the Visual State Space (VSS) Block is introduced to capture extensive contextual information, the Semantics and Detail Infusion (SDI) is introduced to augment the infusion of low-level and high-level features. We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB and ETIS-LaribPolypDB public datasets. The results indicate that VM-UNetV2 exhibits competitive performance in medical image segmentation tasks. Our code is available at https://github.com/nobodyplayer1/VM-UNetV2.
Related papers
- V2M: Visual 2-Dimensional Mamba for Image Representation Learning [68.51380287151927]
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences.
Recent studies have attempted to apply Mamba to the visual domain by flattening 2D images into patches and then regarding them as a 1D sequence.
We propose a Visual 2-Dimensional Mamba model as a complete solution, which directly processes image tokens in the 2D space.
arXiv Detail & Related papers (2024-10-14T11:11:06Z) - Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters [12.182070604073585]
CNNs struggle with modeling long-range dependencies, limiting their ability to fully utilize semantic information in images.
Transformers are hampered by the complexity of quadratic computations.
We propose a model based on the Mamba architecture: Microscopic-Mamba.
arXiv Detail & Related papers (2024-09-12T10:01:33Z) - HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation [1.5574423250822542]
We propose a U-shape architecture model for medical image segmentation, named Hybird Transformer vision Mamba UNet (HTM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-Larib PolypDB public datasets and ZD-LCI-GIM private dataset.
arXiv Detail & Related papers (2024-08-21T02:25:14Z) - MambaVC: Learned Visual Compression with Selective State Spaces [74.29217829932895]
We introduce MambaVC, a simple, strong and efficient compression network based on SSM.
MambaVC develops a visual state space (VSS) block with a 2D selective scanning (2DSS) module as the nonlinear activation function after each downsampling.
On compression benchmark datasets, MambaVC achieves superior rate-distortion performance with lower computational and memory overheads.
arXiv Detail & Related papers (2024-05-24T10:24:30Z) - Rotate to Scan: UNet-like Mamba with Triplet SSM Module for Medical Image Segmentation [8.686237221268584]
We propose Triplet Mamba-UNet as a new type of image segmentation network.
Our model achieves a one-third reduction in parameters compared to the previous VM-UNet.
arXiv Detail & Related papers (2024-03-26T13:40:18Z) - Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation [21.1787366866505]
We propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability.
Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network.
arXiv Detail & Related papers (2024-02-07T18:33:04Z) - VM-UNet: Vision Mamba UNet for Medical Image Segmentation [3.170171905334503]
We propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.
arXiv Detail & Related papers (2024-02-04T13:37:21Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - VMamba: Visual State Space Model [92.83984290020891]
VMamba is a vision backbone that works in linear time complexity.
At the core of VMamba lies a stack of Visual State-Space (VSS) blocks with the 2D Selective Scan (SS2D) module.
arXiv Detail & Related papers (2024-01-18T17:55:39Z) - Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model [51.10876815815515]
We propose a new generic vision backbone with bidirectional Mamba blocks (Vim)
Vim marks the image sequences with position embeddings and compresses the visual representation with bidirectional state space models.
The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images.
arXiv Detail & Related papers (2024-01-17T18:56:18Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.