Multi-Compound Transformer for Accurate Biomedical Image Segmentation
- URL: http://arxiv.org/abs/2106.14385v1
- Date: Mon, 28 Jun 2021 03:45:44 GMT
- Title: Multi-Compound Transformer for Accurate Biomedical Image Segmentation
- Authors: Yuanfeng Ji, Ruimao Zhang, Huijie Wang, Zhen Li, Lingyun Wu, Shaoting
Zhang, and Ping Luo
- Abstract summary: We propose a unified transformer network, termed Multi-Compound Transformer (MCTrans)
MCTrans embeds the multi-scale convolutional features as a sequence of tokens and performs intra- and inter-scale self-attention.
MCTrans can be easily plugged into a UNet-like network and attains a significant improvement over the state-of-the-art methods in biomedical image segmentation.
- Score: 33.49158559361491
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent vision transformer(i.e.for image classification) learns non-local
attentive interaction of different patch tokens. However, prior arts miss
learning the cross-scale dependencies of different pixels, the semantic
correspondence of different labels, and the consistency of the feature
representations and semantic embeddings, which are critical for biomedical
segmentation. In this paper, we tackle the above issues by proposing a unified
transformer network, termed Multi-Compound Transformer (MCTrans), which
incorporates rich feature learning and semantic structure mining into a unified
framework. Specifically, MCTrans embeds the multi-scale convolutional features
as a sequence of tokens and performs intra- and inter-scale self-attention,
rather than single-scale attention in previous works. In addition, a learnable
proxy embedding is also introduced to model semantic relationship and feature
enhancement by using self-attention and cross-attention, respectively. MCTrans
can be easily plugged into a UNet-like network and attains a significant
improvement over the state-of-the-art methods in biomedical image segmentation
in six standard benchmarks. For example, MCTrans outperforms UNet by 3.64%,
3.71%, 4.34%, 2.8%, 1.88%, 1.57% in Pannuke, CVC-Clinic, CVC-Colon, Etis,
Kavirs, ISIC2018 dataset, respectively. Code is available at
https://github.com/JiYuanFeng/MCTrans.
Related papers
- Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - DA-TransUNet: Integrating Spatial and Channel Dual Attention with
Transformer U-Net for Medical Image Segmentation [5.5582646801199225]
This study proposes a novel deep medical image segmentation framework, called DA-TransUNet.
It aims to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture.
Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features.
arXiv Detail & Related papers (2023-10-19T08:25:03Z) - SIM-Trans: Structure Information Modeling Transformer for Fine-grained
Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning.
The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily.
Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - TransAttUnet: Multi-level Attention-guided U-Net with Transformer for
Medical Image Segmentation [33.45471457058221]
This paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet.
In particular, we establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features.
Our method consistently outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T09:17:06Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation [18.755217252996754]
We propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet)
Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoderworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales.
As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism.
arXiv Detail & Related papers (2021-06-12T08:37:17Z) - MlTr: Multi-label Classification with Transformer [35.14232810099418]
We propose a Multi-label Transformer architecture (MlTr) constructed with windows partitioning, in-window pixel attention, cross-window attention.
The proposed MlTr shows state-of-the-art results on various prevalent multi-label datasets such as MS-COCO, Pascal-VOC, and NUS-WIDE.
arXiv Detail & Related papers (2021-06-11T06:53:09Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.