Related papers: UNeXt: MLP-based Rapid Medical Image Segmentation Network

UNeXt: MLP-based Rapid Medical Image Segmentation Network

URL: http://arxiv.org/abs/2203.04967v1
Date: Wed, 9 Mar 2022 18:58:22 GMT
Title: UNeXt: MLP-based Rapid Medical Image Segmentation Network
Authors: Jeya Maria Jose Valanarasu and Vishal M. Patel
Abstract summary: UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. We propose UNeXt which is a Convolutional multilayer perceptron based network for image segmentation. We show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance.
Score: 80.16644725886968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: UNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. However, these networks cannot be effectively adopted for rapid image segmentation in point-of-care applications as they are parameter-heavy, computationally complex and slow to use. To this end, we propose UNeXt which is a Convolutional multilayer perceptron (MLP) based network for image segmentation. We design UNeXt in an effective way with an early convolutional stage and a MLP stage in the latent stage. We propose a tokenized MLP block where we efficiently tokenize and project the convolutional features and use MLPs to model the representation. To further boost the performance, we propose shifting the channels of the inputs while feeding in to MLPs so as to focus on learning local dependencies. Using tokenized MLPs in latent space reduces the number of parameters and computational complexity while being able to result in a better representation to help segmentation. The network also consists of skip connections between various levels of encoder and decoder. We test UNeXt on multiple medical image segmentation datasets and show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance over the state-of-the-art medical image segmentation architectures. Code is available at https://github.com/jeya-maria-jose/UNeXt-pytorch

Related papers

From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields [26.659511924272962]
We develop a new type of connectionism based on hidden and scalable nodes, called NeoMLP. We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data.
arXiv Detail & Related papers (2024-12-11T19:01:38Z)
BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons [37.28828605119602]
This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs) We find that previous binarization methods perform poorly due to limited capacity of binary samplings. We propose to improve the performance of binary mixing and channel mixing (BiMLP) model by enriching the representation ability of binary FC layers.
arXiv Detail & Related papers (2022-12-29T02:43:41Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP [52.25478388220691]
Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks. They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers. We propose a new positional spacial gating unit (PoSGU) to efficiently encode the cross-token relations for token mixing.
arXiv Detail & Related papers (2022-07-15T04:18:06Z)
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval [67.21528544724546]
In CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.
arXiv Detail & Related papers (2022-05-02T12:02:09Z)
CoordX: Accelerating Implicit Neural Representation with a Split MLP Architecture [2.6912336656165805]
Implicit neural representations with multi-layer perceptrons (MLPs) have recently gained prominence for a wide variety of tasks. We propose a new split architecture, CoordX, to accelerate inference and training of coordinate-based representations. We demonstrate a speedup of up to 2.92x compared to the baseline model for image, video, and 3D shape representation and rendering tasks.
arXiv Detail & Related papers (2022-01-28T21:30:42Z)
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality [113.1414517605892]
We propose a methodology, Locality Injection, to incorporate local priors into an FC layer. RepMLPNet is the first that seamlessly transfer to Cityscapes semantic segmentation.
arXiv Detail & Related papers (2021-12-21T10:28:17Z)
ConvMLP: Hierarchical Convolutional MLPs for Vision [7.874749885641495]
We propose a hierarchical ConMLP: a light-weight, stage-wise, co-design for visual recognition. We show that ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters.
arXiv Detail & Related papers (2021-09-09T17:52:57Z)
Sparse-MLP: A Fully-MLP Architecture with Conditional Computation [7.901786481399378]
Mixture-of-Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost. We propose Sparse-MLP, scaling the recent-Mixer model with MoE, to achieve a more-efficient architecture.
arXiv Detail & Related papers (2021-09-05T06:43:08Z)
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition. We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference. By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.