Related papers: Transformer Utilization in Medical Image Segmentation Networks

Transformer Utilization in Medical Image Segmentation Networks

URL: http://arxiv.org/abs/2304.04225v1
Date: Sun, 9 Apr 2023 12:35:22 GMT
Title: Transformer Utilization in Medical Image Segmentation Networks
Authors: Saikat Roy, Gregor Koehler, Michael Baumgartner, Constantin Ulrich, Jens Petersen, Fabian Isensee, Klaus Maier-Hein
Abstract summary: We introduce Transformer Ablations that replace the Transformer blocks with plain linear operators to quantify effectiveness. With experiments on 8 models on 2 medical image segmentation tasks, we explore -- 1) the replaceable nature of Transformer-learnt representations, 2) Transformer capacity alone cannot prevent representational replaceability, and 3) The mere existence of explicit feature hierarchies in transformer blocks is more beneficial than accompanying self-attention modules.
Score: 1.4764524377532229
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Owing to success in the data-rich domain of natural images, Transformers have recently become popular in medical image segmentation. However, the pairing of Transformers with convolutional blocks in varying architectural permutations leaves their relative effectiveness to open interpretation. We introduce Transformer Ablations that replace the Transformer blocks with plain linear operators to quantify this effectiveness. With experiments on 8 models on 2 medical image segmentation tasks, we explore -- 1) the replaceable nature of Transformer-learnt representations, 2) Transformer capacity alone cannot prevent representational replaceability and works in tandem with effective design, 3) The mere existence of explicit feature hierarchies in transformer blocks is more beneficial than accompanying self-attention modules, 4) Major spatial downsampling before Transformer modules should be used with caution.

Related papers

Simplifying Graph Transformers [64.50059165186701]
We propose three simple modifications to the plain Transformer to render it applicable to graphs without introducing major architectural distortions. Specifically, we advocate for the use of (1) simplified $L$ attention to measure the magnitude of closeness tokens; (2) adaptive root-mean-square normalization to preserve token magnitude information; and (3) a relative positional encoding bias with a shared encoder.
arXiv Detail & Related papers (2025-04-17T02:06:50Z)
Primus: Enforcing Attention Usage for 3D Medical Image Segmentation [1.2015918742353526]
We analyze current Transformer-based segmentation models and identify critical shortcomings. We introduce a fully Transformer-based segmentation architecture, termed Primus. Primus surpasses current Transformer-based methods and competes with state-of-the-art convolutional models on public datasets.
arXiv Detail & Related papers (2025-03-03T18:56:29Z)
ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer [7.629385629884155]
We propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities. Our method outperforms current state-of-the-art registration networks and Transformers.
arXiv Detail & Related papers (2023-06-09T06:00:05Z)
TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism for a Deep Segmentation Model [4.320393382724066]
convolutional neural networks (CNNs) have been the prevailing technique in the medical image processing era. We propose Trans-Norm, a novel deep segmentation framework which consolidates a Transformer module into both encoder and skip-connections of the standard U-Net.
arXiv Detail & Related papers (2022-07-27T09:54:10Z)
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling [126.89573619301953]
We propose a new design of hierarchical vision transformers named HiViT (short for Hierarchical ViT) HiViT enjoys both high efficiency and good performance in MIM. In running MAE on ImageNet-1K, HiViT-B reports a +0.6% accuracy gain over ViT-B and a 1.9$times$ speed-up over Swin-B.
arXiv Detail & Related papers (2022-05-30T09:34:44Z)
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks [126.33843752332139]
We introduce Group-wise Transformation towards a universal yet lightweight Transformer for vision-and-language tasks, termed as LW-Transformer. We apply LW-Transformer to a set of Transformer-based networks, and quantitatively measure them on three vision-and-language tasks and six benchmark datasets. Experimental results show that while saving a large number of parameters and computations, LW-Transformer achieves very competitive performance against the original Transformer networks for vision-and-language tasks.
arXiv Detail & Related papers (2022-04-16T11:30:26Z)
Class-Aware Generative Adversarial Transformers for Medical Image Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z)
The Nuts and Bolts of Adopting Transformer in GANs [124.30856952272913]
We investigate the properties of Transformer in the generative adversarial network (GAN) framework for high-fidelity image synthesis. Our study leads to a new alternative design of Transformers in GAN, a convolutional neural network (CNN)-free generator termed as STrans-G.
arXiv Detail & Related papers (2021-10-25T17:01:29Z)
DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation [18.755217252996754]
We propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet) Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoderworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism.
arXiv Detail & Related papers (2021-06-12T08:37:17Z)
Scalable Transformers for Neural Machine Translation [86.4530299266897]
Transformer has been widely adopted in Neural Machine Translation (NMT) because of its large capacity and parallel training of sequence generation. We propose a novel scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters. A three-stage training scheme is proposed to tackle the difficulty of training the scalable Transformers.
arXiv Detail & Related papers (2021-06-04T04:04:10Z)
Transformer-Based Deep Image Matching for Generalizable Person Re-identification [114.56752624945142]
We investigate the possibility of applying Transformers for image matching and metric learning given pairs of images. We find that the Vision Transformer (ViT) and the vanilla Transformer with decoders are not adequate for image matching due to their lack of image-to-image attention. We propose a new simplified decoder, which drops the full attention implementation with the softmax weighting, keeping only the query-key similarity.
arXiv Detail & Related papers (2021-05-30T05:38:33Z)
Incorporating Convolution Designs into Visual Transformers [24.562955955312187]
We propose a new textbfConvolution-enhanced image Transformer (CeiT) which combines the advantages of CNNs in extracting low-level features, strengthening locality, and the advantages of Transformers in establishing long-range dependencies. Experimental results on ImageNet and seven downstream tasks show the effectiveness and generalization ability of CeiT compared with previous Transformers and state-of-the-art CNNs, without requiring a large amount of training data and extra CNN teachers.
arXiv Detail & Related papers (2021-03-22T13:16:12Z)
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks. We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.