Multi-scale Hierarchical Vision Transformer with Cascaded Attention
Decoding for Medical Image Segmentation
- URL: http://arxiv.org/abs/2303.16892v1
- Date: Wed, 29 Mar 2023 17:58:40 GMT
- Title: Multi-scale Hierarchical Vision Transformer with Cascaded Attention
Decoding for Medical Image Segmentation
- Authors: Md Mostafijur Rahman and Radu Marculescu
- Abstract summary: We introduce a Multi-scale hiERarchical vIsion Transformer (MERIT) backbone network, which improves the generalizability of the model by computing SA at multiple scales.
We also incorporate an attention-based decoder, namely Cascaded Attention Decoding (CASCADE), for further refinement of multi-stage features generated by MERIT.
- Score: 8.530680502975095
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Transformers have shown great success in medical image segmentation. However,
transformers may exhibit a limited generalization ability due to the underlying
single-scale self-attention (SA) mechanism. In this paper, we address this
issue by introducing a Multi-scale hiERarchical vIsion Transformer (MERIT)
backbone network, which improves the generalizability of the model by computing
SA at multiple scales. We also incorporate an attention-based decoder, namely
Cascaded Attention Decoding (CASCADE), for further refinement of multi-stage
features generated by MERIT. Finally, we introduce an effective multi-stage
feature mixing loss aggregation (MUTATION) method for better model training via
implicit ensembling. Our experiments on two widely used medical image
segmentation benchmarks (i.e., Synapse Multi-organ, ACDC) demonstrate the
superior performance of MERIT over state-of-the-art methods. Our MERIT
architecture and MUTATION loss aggregation can be used with downstream medical
image and semantic segmentation tasks.
Related papers
- MOSformer: Momentum encoder-based inter-slice fusion transformer for
medical image segmentation [15.94370954641629]
2.5D-based segmentation models often treat each slice equally, failing to effectively learn and exploit inter-slice information.
A novel Momentum encoder-based inter-slice fusion transformer (MOSformer) is proposed to overcome this issue.
The MOSformer is evaluated on three benchmark datasets (Synapse, ACDC, and AMOS), establishing a new state-of-the-art with 85.63%, 92.19%, and 85.43% of DSC, respectively.
arXiv Detail & Related papers (2024-01-22T11:25:59Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - DA-TransUNet: Integrating Spatial and Channel Dual Attention with
Transformer U-Net for Medical Image Segmentation [5.5582646801199225]
This study proposes a novel deep medical image segmentation framework, called DA-TransUNet.
It aims to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture.
Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features.
arXiv Detail & Related papers (2023-10-19T08:25:03Z) - M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical
Image Segmentation [73.10707675345253]
We propose a general multi-scale in multi-scale subtraction network (M$2$SNet) to finish diverse segmentation from medical image.
Our method performs favorably against most state-of-the-art methods under different evaluation metrics on eleven datasets of four different medical image segmentation tasks.
arXiv Detail & Related papers (2023-03-20T06:26:49Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Class-Aware Generative Adversarial Transformers for Medical Image
Segmentation [39.14169989603906]
We present CA-GANformer, a novel type of generative adversarial transformers, for medical image segmentation.
First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations.
We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures.
arXiv Detail & Related papers (2022-01-26T03:50:02Z) - MISSFormer: An Effective Medical Image Segmentation Transformer [3.441872541209065]
CNN-based methods have achieved impressive results in medical image segmentation.
Transformer-based methods are popular in vision tasks recently because of its capacity of long-range dependencies.
We present MISSFormer, an effective and powerful Medical Image tranSFormer.
arXiv Detail & Related papers (2021-09-15T08:56:00Z) - TransAttUnet: Multi-level Attention-guided U-Net with Transformer for
Medical Image Segmentation [33.45471457058221]
This paper proposes a novel Transformer based medical image semantic segmentation framework called TransAttUnet.
In particular, we establish additional multi-scale skip connections between decoder blocks to aggregate the different semantic-scale upsampling features.
Our method consistently outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-07-12T09:17:06Z) - Modality Completion via Gaussian Process Prior Variational Autoencoders
for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan.
MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations.
We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.