HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation
- URL: http://arxiv.org/abs/2207.08518v1
- Date: Mon, 18 Jul 2022 11:30:06 GMT
- Title: HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation
- Authors: Moein Heidari, Amirhossein Kazerouni, Milad Soltany, Reza Azad, Ehsan
Khodapanah Aghdam, Julien Cohen-Adad, Dorit Merhof
- Abstract summary: HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
- Score: 3.478921293603811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) have been the consensus for medical
image segmentation tasks. However, they suffer from the limitation in modeling
long-range dependencies and spatial correlations due to the nature of
convolution operation. Although transformers were first developed to address
this issue, they fail to capture low-level features. In contrast, it is
demonstrated that both local and global features are crucial for dense
prediction, such as segmenting in challenging contexts. In this paper, we
propose HiFormer, a novel method that efficiently bridges a CNN and a
transformer for medical image segmentation. Specifically, we design two
multi-scale feature representations using the seminal Swin Transformer module
and a CNN-based encoder. To secure a fine fusion of global and local features
obtained from the two aforementioned representations, we propose a Double-Level
Fusion (DLF) module in the skip connection of the encoder-decoder structure.
Extensive experiments on various medical image segmentation datasets
demonstrate the effectiveness of HiFormer over other CNN-based,
transformer-based, and hybrid methods in terms of computational complexity, and
quantitative and qualitative results. Our code is publicly available at:
https://github.com/amirhossein-kz/HiFormer
Related papers
- ParaTransCNN: Parallelized TransCNN Encoder for Medical Image
Segmentation [7.955518153976858]
We propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures.
Our method is shown with better segmentation accuracy, especially on small organs.
arXiv Detail & Related papers (2024-01-27T05:58:36Z) - CATS v2: Hybrid encoders for robust medical segmentation [12.194439938007672]
Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks.
However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information.
We propose CATS v2 with hybrid encoders, which better leverage both local and global information.
arXiv Detail & Related papers (2023-08-11T20:21:54Z) - CiT-Net: Convolutional Neural Networks Hand in Hand with Vision
Transformers for Medical Image Segmentation [10.20771849219059]
We propose a novel hybrid architecture of convolutional neural networks (CNNs) and vision Transformers (CiT-Net) for medical image segmentation.
Our CiT-Net provides better medical image segmentation results than popular SOTA methods.
arXiv Detail & Related papers (2023-06-06T03:22:22Z) - ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation.
Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z) - ConvTransSeg: A Multi-resolution Convolution-Transformer Network for
Medical Image Segmentation [14.485482467748113]
We propose a hybrid encoder-decoder segmentation model (ConvTransSeg)
It consists of a multi-layer CNN as the encoder for feature learning and the corresponding multi-level Transformer as the decoder for segmentation prediction.
Our method achieves the best performance in terms of Dice coefficient and average symmetric surface distance measures with low model complexity and memory consumption.
arXiv Detail & Related papers (2022-10-13T14:59:23Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information.
In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion.
We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.