AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation
- URL: http://arxiv.org/abs/2110.10403v1
- Date: Wed, 20 Oct 2021 06:47:28 GMT
- Title: AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation
- Authors: Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie
- Abstract summary: transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
- Score: 19.53151547706724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in transformer-based models have drawn attention to exploring
these techniques in medical image segmentation, especially in conjunction with
the U-Net model (or its variants), which has shown great success in medical
image segmentation, under both 2D and 3D settings. Current 2D based methods
either directly replace convolutional layers with pure transformers or consider
a transformer as an additional intermediate encoder between the encoder and
decoder of U-Net. However, these approaches only consider the attention
encoding within one single slice and do not utilize the axial-axis information
naturally provided by a 3D volume. In the 3D setting, convolution on volumetric
data and transformers both consume large GPU memory. One has to either
downsample the image or use cropped local patches to reduce GPU memory usage,
which limits its performance. In this paper, we propose Axial Fusion
Transformer UNet (AFTer-UNet), which takes both advantages of convolutional
layers' capability of extracting detailed features and transformers' strength
on long sequence modeling. It considers both intra-slice and inter-slice
long-range cues to guide the segmentation. Meanwhile, it has fewer parameters
and takes less GPU memory to train than the previous transformer-based models.
Extensive experiments on three multi-organ segmentation datasets demonstrate
that our method outperforms current state-of-the-art methods.
Related papers
- Cross-domain and Cross-dimension Learning for Image-to-Graph
Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model.
We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers.
We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z) - MOSformer: Momentum encoder-based inter-slice fusion transformer for
medical image segmentation [15.94370954641629]
2.5D-based segmentation models often treat each slice equally, failing to effectively learn and exploit inter-slice information.
A novel Momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue.
The MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), establishing a new state-of-the-art with 85.63%, 92.19%, and 85.43% of DSC, respectively.
arXiv Detail & Related papers (2024-01-22T11:25:59Z) - MIST: Medical Image Segmentation Transformer with Convolutional
Attention Mixing (CAM) Decoder [0.0]
We propose a Medical Image Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder.
MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images.
To enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion.
arXiv Detail & Related papers (2023-10-30T18:07:57Z) - Memory transformers for full context and high-resolution 3D Medical
Segmentation [76.93387214103863]
This paper introduces the Full resolutIoN mEmory (FINE) transformer to overcome this issue.
The core idea behind FINE is to learn memory tokens to indirectly model full range interactions.
Experiments on the BCV image segmentation dataset shows better performances than state-of-the-art CNN and transformer baselines.
arXiv Detail & Related papers (2022-10-11T10:11:05Z) - Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
Understanding [62.502694656615496]
We present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT.
PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art.
We formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding.
arXiv Detail & Related papers (2022-08-25T17:59:29Z) - Cats: Complementary CNN and Transformer Encoders for Segmentation [13.288195115791758]
We propose a model with double encoders for 3D biomedical image segmentation.
We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results.
Compared to the state-of-the-art models with and without transformers on each task, our proposed method obtains higher Dice scores across the board.
arXiv Detail & Related papers (2022-08-24T14:25:11Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.