Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale
Feature Fusion Approach
- URL: http://arxiv.org/abs/2301.10847v1
- Date: Wed, 25 Jan 2023 22:09:07 GMT
- Title: Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale
Feature Fusion Approach
- Authors: Reza Azad, Yiwei Jia, Ehsan Khodapanah Aghdam, Julien Cohen-Adad,
Dorit Merhof
- Abstract summary: CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness.
Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation.
We propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder.
- Score: 3.9548535445908928
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While CNN-based methods have been the cornerstone of medical image
segmentation due to their promising performance and robustness, they suffer
from limitations in capturing long-range dependencies. Transformer-based
approaches are currently prevailing since they enlarge the reception field to
model global contextual correlation. To further extract rich representations,
some extensions of the U-Net employ multi-scale feature extraction and fusion
modules and obtain improved performance. Inspired by this idea, we propose
TransCeption for medical image segmentation, a pure transformer-based U-shape
network featured by incorporating the inception-like module into the encoder
and adopting a contextual bridge for better feature fusion. The design proposed
in this work is based on three core principles: (1) The patch merging module in
the encoder is redesigned with ResInception Patch Merging (RIPM). Multi-branch
transformer (MB transformer) adopts the same number of branches as the outputs
of RIPM. Combining the two modules enables the model to capture a multi-scale
representation within a single stage. (2) We construct an Intra-stage Feature
Fusion (IFF) module following the MB transformer to enhance the aggregation of
feature maps from all the branches and particularly focus on the interaction
between the different channels of all the scales. (3) In contrast to a bridge
that only contains token-wise self-attention, we propose a Dual Transformer
Bridge that also includes channel-wise self-attention to exploit correlations
between scales at different stages from a dual perspective. Extensive
experiments on multi-organ and skin lesion segmentation tasks present the
superior performance of TransCeption compared to previous work. The code is
publicly available at \url{https://github.com/mindflow-institue/TransCeption}.
Related papers
- A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Rethinking Attention Gated with Hybrid Dual Pyramid Transformer-CNN for Generalized Segmentation in Medical Imaging [17.07490339960335]
We introduce a novel hybrid CNN-Transformer segmentation architecture (PAG-TransYnet) designed for efficiently building a strong CNN-Transformer encoder.
Our approach exploits attention gates within a Dual Pyramid hybrid encoder.
arXiv Detail & Related papers (2024-04-28T14:37:10Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical
Image Segmentation [7.720152925974362]
We propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA)
The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron.
We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices.
arXiv Detail & Related papers (2023-07-27T02:18:12Z) - HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation [3.478921293603811]
HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
arXiv Detail & Related papers (2022-07-18T11:30:06Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Multimodal Token Fusion for Vision Transformers [54.81107795090239]
We propose a multimodal token fusion method (TokenFusion) for transformer-based vision tasks.
To effectively fuse multiple modalities, TokenFusion dynamically detects uninformative tokens and substitutes these tokens with projected and aggregated inter-modal features.
The design of TokenFusion allows the transformer to learn correlations among multimodal features, while the single-modal transformer architecture remains largely intact.
arXiv Detail & Related papers (2022-04-19T07:47:50Z) - BatchFormerV2: Exploring Sample Relationships for Dense Representation
Learning [88.82371069668147]
BatchFormerV2 is a more general batch Transformer module, which enables exploring sample relationships for dense representation learning.
BatchFormerV2 consistently improves current DETR-based detection methods by over 1.3%.
arXiv Detail & Related papers (2022-04-04T05:53:42Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - MISSFormer: An Effective Medical Image Segmentation Transformer [3.441872541209065]
CNN-based methods have achieved impressive results in medical image segmentation.
Transformer-based methods are popular in vision tasks recently because of its capacity of long-range dependencies.
We present MISSFormer, an effective and powerful Medical Image tranSFormer.
arXiv Detail & Related papers (2021-09-15T08:56:00Z) - DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation [18.755217252996754]
We propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet)
Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoderworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales.
As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism.
arXiv Detail & Related papers (2021-06-12T08:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.