Learning Multi-axis Representation in Frequency Domain for Medical Image
Segmentation
- URL: http://arxiv.org/abs/2312.17030v1
- Date: Thu, 28 Dec 2023 14:12:31 GMT
- Title: Learning Multi-axis Representation in Frequency Domain for Medical Image
Segmentation
- Authors: Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Suncheng Xiang
- Abstract summary: We propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture.
We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets.
- Score: 6.2246592397835006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Visual Transformer (ViT) has been extensively used in medical image
segmentation (MIS) due to applying self-attention mechanism in the spatial
domain to modeling global knowledge. However, many studies have focused on
improving models in the spatial domain while neglecting the importance of
frequency domain information. Therefore, we propose Multi-axis External Weights
UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention
in ViT with our Multi-axis External Weights block. Specifically, our block
performs a Fourier transform on the three axes of the input features and
assigns the external weight in the frequency domain, which is generated by our
External Weights Generator. Then, an inverse Fourier transform is performed to
change the features back to the spatial domain. We evaluate our model on four
datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach
demonstrates competitive performance, owing to its effective utilization of
frequency domain information.
Related papers
- DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning [5.932234366793244]
Change sensing (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images.
Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest.
We propose DNet, a RSCD network based on dual-domain learning (i.e. frequency and spatial domains)
arXiv Detail & Related papers (2024-06-19T14:54:09Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models [89.76587063609806]
We study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.
By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on several datasets.
arXiv Detail & Related papers (2023-07-27T06:53:16Z) - Dynamic Temporal Filtering in Video Models [128.02725199486719]
We present a new recipe of temporal feature learning, namely Dynamic Temporal Filter (DTF)
DTF learns a specialized frequency filter for every spatial location to model its long-range temporal dynamics.
It is feasible to plug DTF block into ConvNets and Transformer, yielding DTF-Net and DTF-Transformer.
arXiv Detail & Related papers (2022-11-15T15:59:28Z) - MEW-UNet: Multi-axis representation learning in frequency domain for
medical image segmentation [13.456935850832565]
We propose Multi-axis External Weights UNet (MEW-UNet) for medical image segmentation (MIS) based on the U-shape architecture.
Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain.
We evaluate our model on four datasets and achieve state-of-the-art performances.
arXiv Detail & Related papers (2022-10-25T13:22:41Z) - Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property.
We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z) - Fourier Disentangled Space-Time Attention for Aerial Video Recognition [54.80846279175762]
We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent from the background.
We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone.
arXiv Detail & Related papers (2022-03-21T01:24:53Z) - Multidomain Multimodal Fusion For Human Action Recognition Using
Inertial Sensors [1.52292571922932]
We propose a novel multidomain multimodal fusion framework that extracts complementary and distinct features from different domains of the input modality.
Features in different domains are extracted by Convolutional Neural networks (CNNs) and then fused by Canonical Correlation based Fusion (CCF) for improving the accuracy of human action recognition.
arXiv Detail & Related papers (2020-08-22T03:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.