Mixed Transformer U-Net For Medical Image Segmentation
- URL: http://arxiv.org/abs/2111.04734v2
- Date: Thu, 11 Nov 2021 05:51:20 GMT
- Title: Mixed Transformer U-Net For Medical Image Segmentation
- Authors: Hongyi Wang, Shiao Xie, Lanfen Lin, Yutaro Iwamoto, Xian-Hua Han,
Yen-Wei Chen, Ruofeng Tong
- Abstract summary: We propose a novel Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning.
By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation.
- Score: 14.046456257175237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though U-Net has achieved tremendous success in medical image segmentation
tasks, it lacks the ability to explicitly model long-range dependencies.
Therefore, Vision Transformers have emerged as alternative segmentation
structures recently, for their innate ability of capturing long-range
correlations through Self-Attention (SA). However, Transformers usually rely on
large-scale pre-training and have high computational complexity. Furthermore,
SA can only model self-affinities within a single sample, ignoring the
potential correlations of the overall dataset. To address these problems, we
propose a novel Transformer module named Mixed Transformer Module (MTM) for
simultaneous inter- and intra- affinities learning. MTM first calculates
self-affinities efficiently through our well-designed Local-Global
Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections
between data samples through External Attention (EA). By using MTM, we
construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate
medical image segmentation. We test our method on two different public
datasets, and the experimental results show that the proposed method achieves
better performance over other state-of-the-art methods. The code is available
at: https://github.com/Dootmaan/MT-UNet.
Related papers
- HMT-UNet: A hybird Mamba-Transformer Vision UNet for Medical Image Segmentation [1.5574423250822542]
We propose a U-shape architecture model for medical image segmentation, named Hybird Transformer vision Mamba UNet (HTM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, CVC-300, CVC-ClinicDB, Kvasir, CVC-ColonDB, ETIS-Larib PolypDB public datasets and ZD-LCI-GIM private dataset.
arXiv Detail & Related papers (2024-08-21T02:25:14Z) - MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations.
Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality.
No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z) - CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation.
We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales.
We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z) - Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST)
CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background.
Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z) - Deformable Mixer Transformer with Gating for Multi-Task Learning of
Dense Prediction [126.34551436845133]
CNNs and Transformers have their own advantages and both have been widely used for dense prediction in multi-task learning (MTL)
We present a novel MTL model by combining both merits of deformable CNN and query-based Transformer with shared gating for multi-task learning of dense prediction.
arXiv Detail & Related papers (2023-08-10T17:37:49Z) - TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism
for a Deep Segmentation Model [4.320393382724066]
convolutional neural networks (CNNs) have been the prevailing technique in the medical image processing era.
We propose Trans-Norm, a novel deep segmentation framework which consolidates a Transformer module into both encoder and skip-connections of the standard U-Net.
arXiv Detail & Related papers (2022-07-27T09:54:10Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD)
It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches.
Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z) - Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks [75.69896269357005]
Mixup is the latest data augmentation technique that linearly interpolates input examples and the corresponding labels.
In this paper, we explore how to apply mixup to natural language processing tasks.
We incorporate mixup to transformer-based pre-trained architecture, named "mixup-transformer", for a wide range of NLP tasks.
arXiv Detail & Related papers (2020-10-05T23:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.