UNeXt: MLP-based Rapid Medical Image Segmentation Network
- URL: http://arxiv.org/abs/2203.04967v1
- Date: Wed, 9 Mar 2022 18:58:22 GMT
- Title: UNeXt: MLP-based Rapid Medical Image Segmentation Network
- Authors: Jeya Maria Jose Valanarasu and Vishal M. Patel
- Abstract summary: UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years.
We propose UNeXt which is a Convolutional multilayer perceptron based network for image segmentation.
We show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance.
- Score: 80.16644725886968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: UNet and its latest extensions like TransUNet have been the leading medical
image segmentation methods in recent years. However, these networks cannot be
effectively adopted for rapid image segmentation in point-of-care applications
as they are parameter-heavy, computationally complex and slow to use. To this
end, we propose UNeXt which is a Convolutional multilayer perceptron (MLP)
based network for image segmentation. We design UNeXt in an effective way with
an early convolutional stage and a MLP stage in the latent stage. We propose a
tokenized MLP block where we efficiently tokenize and project the convolutional
features and use MLPs to model the representation. To further boost the
performance, we propose shifting the channels of the inputs while feeding in to
MLPs so as to focus on learning local dependencies. Using tokenized MLPs in
latent space reduces the number of parameters and computational complexity
while being able to result in a better representation to help segmentation. The
network also consists of skip connections between various levels of encoder and
decoder. We test UNeXt on multiple medical image segmentation datasets and show
that we reduce the number of parameters by 72x, decrease the computational
complexity by 68x, and improve the inference speed by 10x while also obtaining
better segmentation performance over the state-of-the-art medical image
segmentation architectures. Code is available at
https://github.com/jeya-maria-jose/UNeXt-pytorch
Related papers
- BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons [37.28828605119602]
This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs)
We find that previous binarization methods perform poorly due to limited capacity of binary samplings.
We propose to improve the performance of binary mixing and channel mixing (BiMLP) model by enriching the representation ability of binary FC layers.
arXiv Detail & Related papers (2022-12-29T02:43:41Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Parameterization of Cross-Token Relations with Relative Positional
Encoding for Vision MLP [52.25478388220691]
Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks.
They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.
We propose a new positional spacial gating unit (PoSGU) to efficiently encode the cross-token relations for token mixing.
arXiv Detail & Related papers (2022-07-15T04:18:06Z) - CenterCLIP: Token Clustering for Efficient Text-Video Retrieval [67.21528544724546]
In CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive frames in videos.
This significantly increases computation costs and hinders the deployment of video retrieval models in web applications.
In this paper, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.
arXiv Detail & Related papers (2022-05-02T12:02:09Z) - CoordX: Accelerating Implicit Neural Representation with a Split MLP
Architecture [2.6912336656165805]
Implicit neural representations with multi-layer perceptrons (MLPs) have recently gained prominence for a wide variety of tasks.
We propose a new split architecture, CoordX, to accelerate inference and training of coordinate-based representations.
We demonstrate a speedup of up to 2.92x compared to the baseline model for image, video, and 3D shape representation and rendering tasks.
arXiv Detail & Related papers (2022-01-28T21:30:42Z) - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer.
RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z) - ConvMLP: Hierarchical Convolutional MLPs for Vision [7.874749885641495]
We propose a hierarchical ConMLP: a light-weight, stage-wise, co-design for visual recognition.
We show that ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters.
arXiv Detail & Related papers (2021-09-09T17:52:57Z) - Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [7.901786481399378]
Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost.
We propose Sparse-MLP, scaling the recent-Mixer model with MoE, to achieve a more-efficient architecture.
arXiv Detail & Related papers (2021-09-05T06:43:08Z) - RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition.
We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference.
By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.