MobileUtr: Revisiting the relationship between light-weight CNN and
Transformer for efficient medical image segmentation
- URL: http://arxiv.org/abs/2312.01740v1
- Date: Mon, 4 Dec 2023 09:04:05 GMT
- Title: MobileUtr: Revisiting the relationship between light-weight CNN and
Transformer for efficient medical image segmentation
- Authors: Fenghe Tang, Bingkun Nian, Jianrui Ding, Quan Quan, Jie Yang, Wei Liu,
S.Kevin Zhou
- Abstract summary: This work revisits the relationship between CNNs and Transformers in lightweight universal networks for medical image segmentation.
In order to leverage the inductive bias inherent in CNNs, we abstract a Transformer-like lightweight CNNs block (ConvUtr) as the patch embeddings of ViTs.
We build an efficient medical image segmentation model (MobileUtr) based on CNN and Transformer.
- Score: 25.056401513163493
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the scarcity and specific imaging characteristics in medical images,
light-weighting Vision Transformers (ViTs) for efficient medical image
segmentation is a significant challenge, and current studies have not yet paid
attention to this issue. This work revisits the relationship between CNNs and
Transformers in lightweight universal networks for medical image segmentation,
aiming to integrate the advantages of both worlds at the infrastructure design
level. In order to leverage the inductive bias inherent in CNNs, we abstract a
Transformer-like lightweight CNNs block (ConvUtr) as the patch embeddings of
ViTs, feeding Transformer with denoised, non-redundant and highly condensed
semantic information. Moreover, an adaptive Local-Global-Local (LGL) block is
introduced to facilitate efficient local-to-global information flow exchange,
maximizing Transformer's global context information extraction capabilities.
Finally, we build an efficient medical image segmentation model (MobileUtr)
based on CNN and Transformer. Extensive experiments on five public medical
image datasets with three different modalities demonstrate the superiority of
MobileUtr over the state-of-the-art methods, while boasting lighter weights and
lower computational cost. Code is available at
https://github.com/FengheTan9/MobileUtr.
Related papers
- CNN-Transformer Rectified Collaborative Learning for Medical Image Segmentation [60.08541107831459]
This paper proposes a CNN-Transformer rectified collaborative learning framework to learn stronger CNN-based and Transformer-based models for medical image segmentation.
Specifically, we propose a rectified logit-wise collaborative learning (RLCL) strategy which introduces the ground truth to adaptively select and rectify the wrong regions in student soft labels.
We also propose a class-aware feature-wise collaborative learning (CFCL) strategy to achieve effective knowledge transfer between CNN-based and Transformer-based models in the feature space.
arXiv Detail & Related papers (2024-08-25T01:27:35Z) - CMUNeXt: An Efficient Medical Image Segmentation Network based on Large
Kernel and Skip Fusion [11.434576556863934]
CMUNeXt is an efficient fully convolutional lightweight medical image segmentation network.
It enables fast and accurate auxiliary diagnosis in real scene scenarios.
arXiv Detail & Related papers (2023-08-02T15:54:00Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - PHTrans: Parallelly Aggregating Global and Local Representations for
Medical Image Segmentation [7.140322699310487]
We propose a novel hybrid architecture for medical image segmentation called PHTrans.
PHTrans parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features.
arXiv Detail & Related papers (2022-03-09T08:06:56Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Pyramid Medical Transformer for Medical Image Segmentation [8.157373686645318]
We develop a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans)
Experimental results on two medical image datasets, gland segmentation and MoNuSeg datasets, showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
arXiv Detail & Related papers (2021-04-29T23:57:20Z) - TransMed: Transformers Advance Multi-modal Medical Image Classification [4.500880052705654]
convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks.
Transformers have been applied to computer vision and achieved remarkable success in large-scale datasets.
TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images.
arXiv Detail & Related papers (2021-03-10T08:57:53Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.