UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
- URL: http://arxiv.org/abs/2107.00781v1
- Date: Fri, 2 Jul 2021 00:56:27 GMT
- Title: UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
- Authors: Yunhe Gao, Mu Zhou, Dimitris Metaxas
- Abstract summary: Transformer architecture has emerged to be successful in a number of natural language processing tasks.
We present UTNet, a powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
- Score: 6.646135062704341
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformer architecture has emerged to be successful in a number of natural
language processing tasks. However, its applications to medical vision remain
largely unexplored. In this study, we present UTNet, a simple yet powerful
hybrid Transformer architecture that integrates self-attention into a
convolutional neural network for enhancing medical image segmentation. UTNet
applies self-attention modules in both encoder and decoder for capturing
long-range dependency at different scales with minimal overhead. To this end,
we propose an efficient self-attention mechanism along with relative position
encoding that reduces the complexity of self-attention operation significantly
from $O(n^2)$ to approximate $O(n)$. A new self-attention decoder is also
proposed to recover fine-grained details from the skipped connections in the
encoder. Our approach addresses the dilemma that Transformer requires huge
amounts of data to learn vision inductive bias. Our hybrid layer design allows
the initialization of Transformer into convolutional networks without a need of
pre-training. We have evaluated UTNet on the multi-label, multi-vendor cardiac
magnetic resonance imaging cohort. UTNet demonstrates superior segmentation
performance and robustness against the state-of-the-art approaches, holding the
promise to generalize well on other medical image segmentations.
Related papers
- 3D TransUNet: Advancing Medical Image Segmentation through Vision
Transformers [40.21263511313524]
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning.
The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks.
To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms.
arXiv Detail & Related papers (2023-10-11T18:07:19Z) - MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation [0.46040036610482665]
MaxViT-UNet is a hybrid vision transformer (CNN-Transformer) for medical image segmentation.
The proposed Hybrid Decoder is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage.
The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions.
arXiv Detail & Related papers (2023-05-15T07:23:54Z) - TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism
for a Deep Segmentation Model [4.320393382724066]
convolutional neural networks (CNNs) have been the prevailing technique in the medical image processing era.
We propose Trans-Norm, a novel deep segmentation framework which consolidates a Transformer module into both encoder and skip-connections of the standard U-Net.
arXiv Detail & Related papers (2022-07-27T09:54:10Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.