Boosting Convolution with Efficient MLP-Permutation for Volumetric
Medical Image Segmentation
- URL: http://arxiv.org/abs/2303.13111v3
- Date: Thu, 24 Aug 2023 15:03:57 GMT
- Title: Boosting Convolution with Efficient MLP-Permutation for Volumetric
Medical Image Segmentation
- Authors: Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen
- Abstract summary: Multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT.
We propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and PHNet.
- Score: 32.645022002807416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, the advent of vision Transformer (ViT) has brought substantial
advancements in 3D dataset benchmarks, particularly in 3D volumetric medical
image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP)
network has regained popularity among researchers due to their comparable
results to ViT, albeit with the exclusion of the resource-intensive
self-attention module. In this work, we propose a novel permutable hybrid
network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both
convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic
isotropy problem of 3D volumetric data by employing a combination of 2D and 3D
CNNs to extract local features. Besides, we propose an efficient multi-layer
permute perceptron (MLPP) module that captures long-range dependence while
preserving positional information. This is achieved through an axis
decomposition operation that permutes the input tensor along different axes,
thereby enabling the separate encoding of the positional information.
Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg
with a token segmentation operation, which divides the feature into smaller
tokens and processes them individually. Extensive experimental results validate
that PHNet outperforms the state-of-the-art methods with lower computational
costs on the widely-used yet challenging COVID-19-20 and Synapse benchmarks.
The ablation study also demonstrates the effectiveness of PHNet in harnessing
the strengths of both CNNs and MLP.
Related papers
- SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D
Medical Image Segmentation [36.367368163120794]
We propose a 3D medical image segmentation model, named Efficient to Efficient Network (E2ENet)
It incorporates two parametrically and computationally efficient designs.
It consistently achieves a superior trade-off between accuracy and efficiency across various resource constraints.
arXiv Detail & Related papers (2023-12-07T22:13:37Z) - Dynamic Spectrum Mixer for Visual Recognition [17.180863898764194]
We propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM)
DSM represents token interactions in the frequency domain by employing the Cosine Transform.
It can learn long-term spatial dependencies with log-linear complexity.
arXiv Detail & Related papers (2023-09-13T04:51:15Z) - 3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers [101.44668514239959]
We propose a hybrid encoder-decoder framework that efficiently computes spatial and temporal attentions in parallel.
We also introduce a semantic clutter-background adversarial loss during training that aids in the region of mitochondria instances from the background.
arXiv Detail & Related papers (2023-03-21T17:58:49Z) - Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data [2.207533492015563]
We present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics.
These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training.
We demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks.
arXiv Detail & Related papers (2023-03-01T09:27:08Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - ToDD: Topological Compound Fingerprinting in Computer-Aided Drug
Discovery [8.620443111346523]
In computer-aided drug discovery (CADD), virtual screening is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds.
To address this problem, we developed a novel method using multi parameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors.
We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates.
arXiv Detail & Related papers (2022-11-07T19:00:05Z) - The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in
Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks.
We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z) - Multi-Slice Dense-Sparse Learning for Efficient Liver and Tumor
Segmentation [4.150096314396549]
Deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation.
We propose a novel dense-sparse training flow from a data perspective, in which, densely adjacent slices and sparsely adjacent slices are extracted as inputs for regularizing DCNNs.
We also design a 2.5D light-weight nnU-Net from a network perspective, in which, depthwise separable convolutions are adopted to improve the efficiency.
arXiv Detail & Related papers (2021-08-15T15:29:48Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Fourier Features Let Networks Learn High Frequency Functions in Low
Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions.
Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.