Patcher: Patch Transformers with Mixture of Experts for Precise Medical
Image Segmentation
- URL: http://arxiv.org/abs/2206.01741v2
- Date: Mon, 29 May 2023 23:52:49 GMT
- Title: Patcher: Patch Transformers with Mixture of Experts for Precise Medical
Image Segmentation
- Authors: Yanglan Ou, Ye Yuan, Xiaolei Huang, Stephen T.C. Wong, John Volpi,
James Z. Wang, Kelvin Wong
- Abstract summary: We present a new encoder-decoder Vision Transformer architecture, Patcher, for medical image segmentation.
Unlike standard Vision Transformers, it employs Patcher blocks that segment an image into large patches.
Transformers are applied to the small patches within a large patch, which constrains the receptive field of each pixel.
- Score: 17.51577168487812
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new encoder-decoder Vision Transformer architecture, Patcher,
for medical image segmentation. Unlike standard Vision Transformers, it employs
Patcher blocks that segment an image into large patches, each of which is
further divided into small patches. Transformers are applied to the small
patches within a large patch, which constrains the receptive field of each
pixel. We intentionally make the large patches overlap to enhance intra-patch
communication. The encoder employs a cascade of Patcher blocks with increasing
receptive fields to extract features from local to global levels. This design
allows Patcher to benefit from both the coarse-to-fine feature extraction
common in CNNs and the superior spatial relationship modeling of Transformers.
We also propose a new mixture-of-experts (MoE) based decoder, which treats the
feature maps from the encoder as experts and selects a suitable set of expert
features to predict the label for each pixel. The use of MoE enables better
specializations of the expert features and reduces interference between them
during inference. Extensive experiments demonstrate that Patcher outperforms
state-of-the-art Transformer- and CNN-based approaches significantly on stroke
lesion segmentation and polyp segmentation. Code for Patcher is released with
publication to facilitate future research.
Related papers
- Augmenting Prototype Network with TransMix for Few-shot Hyperspectral
Image Classification [9.479240476603353]
We propose to augment the prototype network with TransMix for few-shot hyperspectral image classification(APNT)
While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation.
The proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification.
arXiv Detail & Related papers (2024-01-22T06:56:52Z) - MIST: Medical Image Segmentation Transformer with Convolutional
Attention Mixing (CAM) Decoder [0.0]
We propose a Medical Image Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder.
MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images.
To enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion.
arXiv Detail & Related papers (2023-10-30T18:07:57Z) - Pure Transformer with Integrated Experts for Scene Text Recognition [11.089203218000854]
Scene text recognition (STR) involves the task of reading text in cropped images of natural scenes.
Recent times, the transformer architecture is being widely adopted in STR as it shows strong capability in capturing long-term dependency.
This work proposes the use of a transformer-only model as a simple baseline which outperforms hybrid CNN-transformer models.
arXiv Detail & Related papers (2022-11-09T15:26:59Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - Exploring and Improving Mobile Level Vision Transformers [81.7741384218121]
We study the vision transformer structure in the mobile level in this paper, and find a dramatic performance drop.
We propose a novel irregular patch embedding module and adaptive patch fusion module to improve the performance.
arXiv Detail & Related papers (2021-08-30T06:42:49Z) - DPT: Deformable Patch-based Transformer for Visual Recognition [57.548916081146814]
We propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way.
The DePatch module can work as a plug-and-play module, which can easily be incorporated into different transformers to achieve an end-to-end training.
arXiv Detail & Related papers (2021-07-30T07:33:17Z) - Medical Image Segmentation using Squeeze-and-Expansion Transformers [12.793250990122557]
Segtran is an alternative segmentation framework based on transformers.
Segtran consistently achieved the highest segmentation accuracy, and exhibited good cross-domain generalization capabilities.
arXiv Detail & Related papers (2021-05-20T04:45:47Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers [149.78470371525754]
We treat semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer to encode an image as a sequence of patches.
With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR)
SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes.
arXiv Detail & Related papers (2020-12-31T18:55:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.