MEW-UNet: Multi-axis representation learning in frequency domain for
medical image segmentation
- URL: http://arxiv.org/abs/2210.14007v1
- Date: Tue, 25 Oct 2022 13:22:41 GMT
- Title: MEW-UNet: Multi-axis representation learning in frequency domain for
medical image segmentation
- Authors: Jiacheng Ruan, Mingye Xie, Suncheng Xiang, Ting Liu, Yuzhuo Fu
- Abstract summary: We propose Multi-axis External Weights UNet (MEW-UNet) for medical image segmentation (MIS) based on the U-shape architecture.
Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain.
We evaluate our model on four datasets and achieve state-of-the-art performances.
- Score: 13.456935850832565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Visual Transformer (ViT) has been widely used in various fields of
computer vision due to applying self-attention mechanism in the spatial domain
to modeling global knowledge. Especially in medical image segmentation (MIS),
many works are devoted to combining ViT and CNN, and even some works directly
utilize pure ViT-based models. However, recent works improved models in the
aspect of spatial domain while ignoring the importance of frequency domain
information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet)
for MIS based on the U-shape architecture by replacing self-attention in ViT
with our Multi-axis External Weights block. Specifically, our block performs a
Fourier transform on the three axes of the input feature and assigns the
external weight in the frequency domain, which is generated by our Weights
Generator. Then, an inverse Fourier transform is performed to change the
features back to the spatial domain. We evaluate our model on four datasets and
achieve state-of-the-art performances. In particular, on the Synapse dataset,
our method outperforms MT-UNet by 10.15mm in terms of HD95. Code is available
at https://github.com/JCruan519/MEW-UNet.
Related papers
- Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis [9.969451740838418]
We introduce Neural Fourier Modelling (NFM), a compact yet powerful solution for time-series analysis.
NFM is grounded in two key properties of the Fourier transform (FT): (i) the ability to model finite-length time series as functions in the Fourier domain, and (ii) the capacity for data manipulation within the Fourier domain.
NFM achieves state-of-the-art performance on a wide range of tasks, including challenging time-series scenarios with previously unseen sampling rates at test time.
arXiv Detail & Related papers (2024-10-07T02:39:55Z) - Vim-F: Visual State Space Model Benefiting from Learning in the Frequency Domain [9.458951424465605]
State Space Models (SSMs) with efficient hardware-aware designs, known as the Mamba deep learning models, have made significant progress in modeling long sequences.
We propose a novel model called Vim-F, which employs pure Mamba encoders and scans in both the frequency and spatial domains.
arXiv Detail & Related papers (2024-05-29T01:01:19Z) - Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation [5.6980032783048316]
We propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture.
We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets.
arXiv Detail & Related papers (2023-12-28T14:12:31Z) - DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
We present Deformable Attention Transformer ( DAT++), a vision backbone efficient and effective for visual recognition.
DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9% ImageNet accuracy, 54.5 and 47.0 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.
arXiv Detail & Related papers (2023-09-04T08:26:47Z) - Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property.
We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Fourier Disentangled Space-Time Attention for Aerial Video Recognition [54.80846279175762]
We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent from the background.
We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone.
arXiv Detail & Related papers (2022-03-21T01:24:53Z) - Vision Transformer with Deformable Attention [29.935891419574602]
Large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts.
We propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way.
We present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks.
arXiv Detail & Related papers (2022-01-03T08:29:01Z) - Global Filter Networks for Image Classification [90.81352483076323]
We present a conceptually simple yet computationally efficient architecture that learns long-term spatial dependencies in the frequency domain with log-linear complexity.
Our results demonstrate that GFNet can be a very competitive alternative to transformer-style models and CNNs in efficiency, generalization ability and robustness.
arXiv Detail & Related papers (2021-07-01T17:58:16Z) - Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples.
We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.