Related papers: EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices

EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices

URL: http://arxiv.org/abs/2410.15036v1
Date: Sat, 19 Oct 2024 08:42:53 GMT
Title: EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices
Authors: Xin Li, Wenhui Zhu, Xuanzhao Dong, Oana M. Dumitrascu, Yalin Wang,
Abstract summary: We propose EViT-UNet, an efficient ViT-based segmentation network that reduces computational complexity while maintaining accuracy. EViT-UNet is built on a U-shaped architecture, comprising an encoder, decoder, bottleneck layer, and skip connections. Experimental results demonstrate that EViT-UNet achieves high accuracy in medical image segmentation while significantly reducing computational complexity.
Score: 5.307205032859535
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the rapid development of deep learning, CNN-based U-shaped networks have succeeded in medical image segmentation and are widely applied for various tasks. However, their limitations in capturing global features hinder their performance in complex segmentation tasks. The rise of Vision Transformer (ViT) has effectively compensated for this deficiency of CNNs and promoted the application of ViT-based U-networks in medical image segmentation. However, the high computational demands of ViT make it unsuitable for many medical devices and mobile platforms with limited resources, restricting its deployment on resource-constrained and edge devices. To address this, we propose EViT-UNet, an efficient ViT-based segmentation network that reduces computational complexity while maintaining accuracy, making it ideal for resource-constrained medical devices. EViT-UNet is built on a U-shaped architecture, comprising an encoder, decoder, bottleneck layer, and skip connections, combining convolutional operations with self-attention mechanisms to optimize efficiency. Experimental results demonstrate that EViT-UNet achieves high accuracy in medical image segmentation while significantly reducing computational complexity.

Related papers

Distilled Pooling Transformer Encoder for Efficient Realistic Image Dehazing [0.0]
This paper proposes a lightweight neural network designed for realistic image dehazing, utilizing a Distilled Pooling Transformer, named DPTE-Net. Experimental results on various benchmark datasets have shown that the proposed DPTE-Net can achieve competitive dehazing performance when compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-12-18T14:16:23Z)
CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation [22.645013853519]
CSWin-UNet is a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet. Our empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy.
arXiv Detail & Related papers (2024-07-25T14:25:17Z)
CMUNeXt: An Efficient Medical Image Segmentation Network based on Large Kernel and Skip Fusion [11.434576556863934]
CMUNeXt is an efficient fully convolutional lightweight medical image segmentation network. It enables fast and accurate auxiliary diagnosis in real scene scenarios.
arXiv Detail & Related papers (2023-08-02T15:54:00Z)
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation. It simultaneously learns global semantic information and local spatial-detailed features. Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z)
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation [79.265315267391]
We propose a simple and compact ViT architecture called Universal Vision Transformer (UViT) UViT achieves strong performance on object detection and instance segmentation tasks.
arXiv Detail & Related papers (2021-12-17T20:11:56Z)
LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation [6.2059756782278965]
We propose LeViT-UNet, which integrates a LeViT Transformer module into the U-Net architecture. Specifically, we use LeViT as the encoder of the LeViT-UNet, which better trades off the accuracy and efficiency of the Transformer block.
arXiv Detail & Related papers (2021-07-19T05:48:51Z)
UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation [6.646135062704341]
Transformer architecture has emerged to be successful in a number of natural language processing tasks. We present UTNet, a powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
arXiv Detail & Related papers (2021-07-02T00:56:27Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard. We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.