Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?
- URL: http://arxiv.org/abs/2406.16993v1
- Date: Mon, 24 Jun 2024 08:01:05 GMT
- Title: Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?
- Authors: Pallabi Dutta, Soham Bose, Swalpa Kumar Roy, Sushmita Mitra,
- Abstract summary: This paper investigates the integration of CNNs and Vision Extended Long Short-Term Memory (Vision-xLSTM) models by introducing a novel approach called UVixLSTM.
The Vision-xLSTM blocks captures temporal and global relationships within the patches extracted from the CNN feature maps.
UVixLSTM exhibits superior performance compared to state-of-the-art networks on the publicly-available dataset.
- Score: 3.1777394653936937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancement of developing efficient medical image segmentation has evolved from initial dependence on Convolutional Neural Networks (CNNs) to the present investigation of hybrid models that combine CNNs with Vision Transformers. Furthermore, there is an increasing focus on creating architectures that are both high-performing in medical image segmentation tasks and computationally efficient to be deployed on systems with limited resources. Although transformers have several advantages like capturing global dependencies in the input data, they face challenges such as high computational and memory complexity. This paper investigates the integration of CNNs and Vision Extended Long Short-Term Memory (Vision-xLSTM) models by introducing a novel approach called UVixLSTM. The Vision-xLSTM blocks captures temporal and global relationships within the patches extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms a reliable backbone for medical image segmentation tasks, offering excellent segmentation performance and reduced computational complexity. UVixLSTM exhibits superior performance compared to state-of-the-art networks on the publicly-available Synapse dataset. Code is available at: https://github.com/duttapallabi2907/UVixLSTM
Related papers
- TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation [6.013821375459473]
We introduce a novel deep learning architecture for medical image segmentation.
Our proposed model shows consistent improvement over the state of the art on ten publicly available datasets.
arXiv Detail & Related papers (2024-09-05T09:14:03Z) - xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart [13.812935743270517]
We propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (xLSTM) as its backbone for medical image segmentation.
xLSTM is a recently proposed as the successor of Long Short-Term Memory (LSTM) networks.
Our findings demonstrate that xLSTM-UNet consistently surpasses the performance of leading CNN-based, Transformer-based, and Mamba-based segmentation networks.
arXiv Detail & Related papers (2024-07-01T17:59:54Z) - Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images [1.5954224931801726]
This study is the first attempt to evaluate the effectiveness of Vision-LSTM in the semantic segmentation of remotely sensed images.
Our study found that Vision-LSTM's performance in semantic segmentation was limited and generally inferior to Vision-Transformers-based and Vision-Mamba-based models in most comparative tests.
arXiv Detail & Related papers (2024-06-20T08:01:28Z) - Efficient Visual State Space Model for Image Deblurring [83.57239834238035]
Convolutional neural networks (CNNs) and Vision Transformers (ViTs) have achieved excellent performance in image restoration.
We propose a simple yet effective visual state space model (EVSSM) for image deblurring.
arXiv Detail & Related papers (2024-05-23T09:13:36Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS
Instance Segmentation [11.575821326313607]
We propose Video-TransUNet, a deep architecture for segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework.
In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconal architecture with multiple heads.
arXiv Detail & Related papers (2022-08-17T14:28:58Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.