Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation
- URL: http://arxiv.org/abs/2102.10662v1
- Date: Sun, 21 Feb 2021 18:35:14 GMT
- Title: Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation
- Authors: Jeya Maria Jose Valanarasu, Poojan Oza, Ilker Hacihaliloglu, Vishal M.
Patel
- Abstract summary: We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
- Score: 73.98974074534497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Over the past decade, Deep Convolutional Neural Networks have been widely
adopted for medical image segmentation and shown to achieve adequate
performance. However, due to the inherent inductive biases present in the
convolutional architectures, they lack understanding of long-range dependencies
in the image. Recently proposed Transformer-based architectures that leverage
self-attention mechanism encode long-range dependencies and learn
representations that are highly expressive. This motivates us to explore
Transformer-based solutions and study the feasibility of using
Transformer-based network architectures for medical image segmentation tasks.
Majority of existing Transformer-based network architectures proposed for
vision applications require large-scale datasets to train properly. However,
compared to the datasets for vision applications, for medical imaging the
number of data samples is relatively low, making it difficult to efficiently
train transformers for medical applications. To this end, we propose a Gated
Axial-Attention model which extends the existing architectures by introducing
an additional control mechanism in the self-attention module. Furthermore, to
train the model effectively on medical images, we propose a Local-Global
training strategy (LoGo) which further improves the performance. Specifically,
we operate on the whole image and patches to learn global and local features,
respectively. The proposed Medical Transformer (MedT) is evaluated on three
different medical image segmentation datasets and it is shown that it achieves
better performance than the convolutional and other related transformer-based
architectures. Code: https://github.com/jeya-maria-jose/Medical-Transformer
Related papers
- SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - 3D TransUNet: Advancing Medical Image Segmentation through Vision
Transformers [40.21263511313524]
Medical image segmentation plays a crucial role in advancing healthcare systems for disease diagnosis and treatment planning.
The u-shaped architecture, popularly known as U-Net, has proven highly successful for various medical image segmentation tasks.
To address these limitations, researchers have turned to Transformers, renowned for their global self-attention mechanisms.
arXiv Detail & Related papers (2023-10-11T18:07:19Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - PHTrans: Parallelly Aggregating Global and Local Representations for
Medical Image Segmentation [7.140322699310487]
We propose a novel hybrid architecture for medical image segmentation called PHTrans.
PHTrans parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features.
arXiv Detail & Related papers (2022-03-09T08:06:56Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Pyramid Medical Transformer for Medical Image Segmentation [8.157373686645318]
We develop a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans)
Experimental results on two medical image datasets, gland segmentation and MoNuSeg datasets, showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.
arXiv Detail & Related papers (2021-04-29T23:57:20Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.