CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation
- URL: http://arxiv.org/abs/2103.03024v1
- Date: Thu, 4 Mar 2021 13:34:22 GMT
- Title: CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation
- Authors: Yutong Xie, Jianpeng Zhang, Chunhua Shen, Yong Xia
- Abstract summary: Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
- Score: 95.51455777713092
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional neural networks (CNNs) have been the de facto standard for
nowadays 3D medical image segmentation. The convolutional operations used in
these networks, however, inevitably have limitations in modeling the long-range
dependency due to their inductive bias of locality and weight sharing. Although
Transformer was born to address this issue, it suffers from extreme
computational and spatial complexities in processing high-resolution 3D feature
maps. In this paper, we propose a novel framework that efficiently bridges a
{\bf Co}nvolutional neural network and a {\bf Tr}ansformer {\bf (CoTr)} for
accurate 3D medical image segmentation. Under this framework, the CNN is
constructed to extract feature representations and an efficient deformable
Transformer (DeTrans) is built to model the long-range dependency on the
extracted feature maps. Different from the vanilla Transformer which treats all
image positions equally, our DeTrans pays attention only to a small set of key
positions by introducing the deformable self-attention mechanism. Thus, the
computational and spatial complexities of DeTrans have been greatly reduced,
making it possible to process the multi-scale and high-resolution feature maps,
which are usually of paramount importance for image segmentation. We conduct an
extensive evaluation on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV)
dataset that covers 11 major human organs. The results indicate that our CoTr
leads to a substantial performance improvement over other CNN-based,
transformer-based, and hybrid methods on the 3D multi-organ segmentation task.
Code is available at \def\UrlFont{\rm\small\ttfamily}
\url{https://github.com/YtongXie/CoTr}
Related papers
- TransResNet: Integrating the Strengths of ViTs and CNNs for High Resolution Medical Image Segmentation via Feature Grafting [6.987177704136503]
High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method.
Most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and perform poorly on high-resolution images.
We propose a parallel-in-branch architecture called TransResNet, which incorporates Transformer and CNN in a parallel manner to extract features from multi-resolution images independently.
arXiv Detail & Related papers (2024-10-01T18:22:34Z) - TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for
Medical Image Segmentation [20.976167468217387]
We propose vision Transformer embrace convolutional neural networks for medical image segmentation (TEC-Net)
Our network has two advantages. First, dynamic deformable convolution (DDConv) is designed in the CNN branch, which not only overcomes the difficulty of adaptive feature extraction using fixed-size convolution kernels, but also solves the defect that different inputs share the same convolution kernel parameters.
Experimental results show that the proposed TEC-Net provides better medical image segmentation results than SOTA methods including CNN and Transformer networks.
arXiv Detail & Related papers (2023-06-07T01:14:16Z) - CiT-Net: Convolutional Neural Networks Hand in Hand with Vision
Transformers for Medical Image Segmentation [10.20771849219059]
We propose a novel hybrid architecture of convolutional neural networks (CNNs) and vision Transformers (CiT-Net) for medical image segmentation.
Our CiT-Net provides better medical image segmentation results than popular SOTA methods.
arXiv Detail & Related papers (2023-06-06T03:22:22Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.