Related papers: Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation

Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation

URL: http://arxiv.org/abs/2303.13111v3
Date: Thu, 24 Aug 2023 15:03:57 GMT
Title: Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation
Authors: Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen
Abstract summary: Multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT. We propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and PHNet.
Score: 32.645022002807416
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, the advent of vision Transformer (ViT) has brought substantial advancements in 3D dataset benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg). Concurrently, multi-layer perceptron (MLP) network has regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the resource-intensive self-attention module. In this work, we propose a novel permutable hybrid network for Vol-MedSeg, named PHNet, which capitalizes on the strengths of both convolution neural networks (CNNs) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by employing a combination of 2D and 3D CNNs to extract local features. Besides, we propose an efficient multi-layer permute perceptron (MLPP) module that captures long-range dependence while preserving positional information. This is achieved through an axis decomposition operation that permutes the input tensor along different axes, thereby enabling the separate encoding of the positional information. Furthermore, MLPP tackles the resolution sensitivity issue of MLP in Vol-MedSeg with a token segmentation operation, which divides the feature into smaller tokens and processes them individually. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods with lower computational costs on the widely-used yet challenging COVID-19-20 and Synapse benchmarks. The ablation study also demonstrates the effectiveness of PHNet in harnessing the strengths of both CNNs and MLP.

Related papers

Deep-Unrolling Multidimensional Harmonic Retrieval Algorithms on Neuromorphic Hardware [78.17783007774295]
This paper explores the potential of conversion-based neuromorphic algorithms for highly accurate and energy-efficient single-snapshot multidimensional harmonic retrieval. A novel method for converting the complex-valued convolutional layers and activations into spiking neural networks (SNNs) is developed. The converted SNNs achieve almost five-fold power efficiency at moderate performance loss compared to the original CNNs.
arXiv Detail & Related papers (2024-12-05T09:41:33Z)
SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification. The proposed framework has been validated through comprehensive experiments on two clinical datasets. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z)
Efficient 3D affinely equivariant CNNs with adaptive fusion of augmented spherical Fourier-Bessel bases [0.36122488107441414]
Filter-decomposition-based group equivariant convolutional neural networks (CNNs) have shown promising stability and data efficiency for 3D image feature extraction. This paper presents an efficient non- parameter-sharing continuous 3D affine group equivariant neural network for volumetric images.
arXiv Detail & Related papers (2024-02-26T18:51:15Z)
E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation [36.367368163120794]
We propose a 3D medical image segmentation model, named Efficient to Efficient Network (E2ENet) It incorporates two parametrically and computationally efficient designs. It consistently achieves a superior trade-off between accuracy and efficiency across various resource constraints.
arXiv Detail & Related papers (2023-12-07T22:13:37Z)
Dynamic Spectrum Mixer for Visual Recognition [17.180863898764194]
We propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM) DSM represents token interactions in the frequency domain by employing the Cosine Transform. It can learn long-term spatial dependencies with log-linear complexity.
arXiv Detail & Related papers (2023-09-13T04:51:15Z)
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers [101.44668514239959]
We propose a hybrid encoder-decoder framework that efficiently computes spatial and temporal attentions in parallel. We also introduce a semantic clutter-background adversarial loss during training that aids in the region of mitochondria instances from the background.
arXiv Detail & Related papers (2023-03-21T17:58:49Z)
Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data [2.207533492015563]
We present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. We demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks.
arXiv Detail & Related papers (2023-03-01T09:27:08Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery [8.620443111346523]
In computer-aided drug discovery (CADD), virtual screening is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. To address this problem, we developed a novel method using multi parameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates.
arXiv Detail & Related papers (2022-11-07T19:00:05Z)
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers [59.87030906486969]
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. We show that sparsity is a prevalent phenomenon that occurs for both natural language processing and vision tasks. We discuss how sparsity immediately implies a way to significantly reduce the FLOP count and improve efficiency for Transformers.
arXiv Detail & Related papers (2022-10-12T15:25:19Z)
Multi-Slice Dense-Sparse Learning for Efficient Liver and Tumor Segmentation [4.150096314396549]
Deep convolutional neural network (DCNNs) has obtained tremendous success in 2D and 3D medical image segmentation. We propose a novel dense-sparse training flow from a data perspective, in which, densely adjacent slices and sparsely adjacent slices are extracted as inputs for regularizing DCNNs. We also design a 2.5D light-weight nnU-Net from a network perspective, in which, depthwise separable convolutions are adopted to improve the efficiency.
arXiv Detail & Related papers (2021-08-15T15:29:48Z)
Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity. Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z)
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains [69.62456877209304]
We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron to learn high-frequency functions. Results shed light on advances in computer vision and graphics that achieve state-of-the-art results.
arXiv Detail & Related papers (2020-06-18T17:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.