H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for
Multimodal Tumor Segmentation
- URL: http://arxiv.org/abs/2307.01486v1
- Date: Tue, 4 Jul 2023 05:31:09 GMT
- Title: H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for
Multimodal Tumor Segmentation
- Authors: Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao,
Zhaohui Wang, Hong An, Xudong Xue
- Abstract summary: In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer.
Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input.
The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity.
- Score: 5.999728323822383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, deep learning methods have been widely used for tumor segmentation
of multimodal medical images with promising results. However, most existing
methods are limited by insufficient representational ability, specific modality
number and high computational complexity. In this paper, we propose a hybrid
densely connected network for tumor segmentation, named H-DenseFormer, which
combines the representational power of the Convolutional Neural Network (CNN)
and the Transformer structures. Specifically, H-DenseFormer integrates a
Transformer-based Multi-path Parallel Embedding (MPE) module that can take an
arbitrary number of modalities as input to extract the fusion features from
different modalities. Then, the multimodal fusion features are delivered to
different levels of the encoder to enhance multimodal learning representation.
Besides, we design a lightweight Densely Connected Transformer (DCT) block to
replace the standard Transformer block, thus significantly reducing
computational complexity. We conduct extensive experiments on two public
multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that
our proposed method outperforms the existing state-of-the-art methods while
having lower computational complexity. The source code is available at
https://github.com/shijun18/H-DenseFormer.
Related papers
- Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - Hybrid-Fusion Transformer for Multisequence MRI [2.082367820170703]
We propose the novel hybrid fusion Transformer (HFTrans) for multisequence MRI image segmentation.
We take advantage of the differences among multimodal MRI sequences and utilize the Transformer layers to integrate the features extracted from each modality.
We validate the effectiveness of our hybrid-fusion method in three-dimensional (3D) medical segmentation.
arXiv Detail & Related papers (2023-11-02T15:22:49Z) - MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor
Formula for Image Dehazing [88.61523825903998]
Transformer networks are beginning to replace pure convolutional neural networks (CNNs) in the field of computer vision.
We propose a new Transformer variant, which applies the Taylor expansion to approximate the softmax-attention and achieves linear computational complexity.
We introduce a multi-branch architecture with multi-scale patch embedding to the proposed Transformer, which embeds features by overlapping deformable convolution of different scales.
Our model, named Multi-branch Transformer expanded by Taylor formula (MB-TaylorFormer), can embed coarse to fine features more flexibly at the patch embedding stage and capture long-distance pixel interactions with limited computational cost
arXiv Detail & Related papers (2023-08-27T08:10:23Z) - ConvTransSeg: A Multi-resolution Convolution-Transformer Network for
Medical Image Segmentation [14.485482467748113]
We propose a hybrid encoder-decoder segmentation model (ConvTransSeg)
It consists of a multi-layer CNN as the encoder for feature learning and the corresponding multi-level Transformer as the decoder for segmentation prediction.
Our method achieves the best performance in terms of Dice coefficient and average symmetric surface distance measures with low model complexity and memory consumption.
arXiv Detail & Related papers (2022-10-13T14:59:23Z) - HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation [3.478921293603811]
HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
arXiv Detail & Related papers (2022-07-18T11:30:06Z) - mmFormer: Multimodal Medical Transformer for Incomplete Multimodal
Learning of Brain Tumor Segmentation [38.22852533584288]
We propose a novel Medical Transformer (mmFormer) for incomplete multimodal learning with three main components.
The proposed mmFormer outperforms the state-of-the-art methods for incomplete multimodal brain tumor segmentation on almost all subsets of incomplete modalities.
arXiv Detail & Related papers (2022-06-06T08:41:56Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - TransCMD: Cross-Modal Decoder Equipped with Transformer for RGB-D
Salient Object Detection [86.94578023985677]
In this work, we rethink this task from the perspective of global information alignment and transformation.
Specifically, the proposed method (TransCMD) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path.
Experimental results on seven RGB-D SOD benchmark datasets demonstrate that a simple two-stream encoder-decoder framework can surpass the state-of-the-art purely CNN-based methods.
arXiv Detail & Related papers (2021-12-04T15:45:34Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.