Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation
- URL: http://arxiv.org/abs/2312.17030v2
- Date: Tue, 24 Sep 2024 12:46:17 GMT
- Title: Learning Multi-axis Representation in Frequency Domain for Medical Image Segmentation
- Authors: Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Suncheng Xiang,
- Abstract summary: We propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture.
We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets.
- Score: 5.6980032783048316
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Visual Transformer (ViT) has been extensively used in medical image segmentation (MIS) due to applying self-attention mechanism in the spatial domain to modeling global knowledge. However, many studies have focused on improving models in the spatial domain while neglecting the importance of frequency domain information. Therefore, we propose Multi-axis External Weights UNet (MEW-UNet) based on the U-shape architecture by replacing self-attention in ViT with our Multi-axis External Weights block. Specifically, our block performs a Fourier transform on the three axes of the input features and assigns the external weight in the frequency domain, which is generated by our External Weights Generator. Then, an inverse Fourier transform is performed to change the features back to the spatial domain. We evaluate our model on four datasets, including Synapse, ACDC, ISIC17 and ISIC18 datasets, and our approach demonstrates competitive performance, owing to its effective utilization of frequency domain information.
Related papers
- Spatial and Frequency Domain Adaptive Fusion Network for Image Deblurring [0.0]
Image deblurring aims to reconstruct a latent sharp image from its corresponding blurred one.
We propose a spatial-frequency domain adaptive fusion network (SFAFNet) to address this limitation.
Our SFAFNet performs favorably compared to state-of-the-art approaches on commonly used benchmarks.
arXiv Detail & Related papers (2025-02-20T02:43:55Z) - FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation [50.9040167152168]
We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.
We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.
To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)
We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Neural Fourier Modelling: A Highly Compact Approach to Time-Series Analysis [9.969451740838418]
We introduce Neural Fourier Modelling (NFM), a compact yet powerful solution for time-series analysis.
NFM is grounded in two key properties of the Fourier transform (FT): (i) the ability to model finite-length time series as functions in the Fourier domain, and (ii) the capacity for data manipulation within the Fourier domain.
NFM achieves state-of-the-art performance on a wide range of tasks, including challenging time-series scenarios with previously unseen sampling rates at test time.
arXiv Detail & Related papers (2024-10-07T02:39:55Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Dynamic Temporal Filtering in Video Models [128.02725199486719]
We present a new recipe of temporal feature learning, namely Dynamic Temporal Filter (DTF)
DTF learns a specialized frequency filter for every spatial location to model its long-range temporal dynamics.
It is feasible to plug DTF block into ConvNets and Transformer, yielding DTF-Net and DTF-Transformer.
arXiv Detail & Related papers (2022-11-15T15:59:28Z) - MEW-UNet: Multi-axis representation learning in frequency domain for
medical image segmentation [13.456935850832565]
We propose Multi-axis External Weights UNet (MEW-UNet) for medical image segmentation (MIS) based on the U-shape architecture.
Specifically, our block performs a Fourier transform on the three axes of the input feature and assigns the external weight in the frequency domain.
We evaluate our model on four datasets and achieve state-of-the-art performances.
arXiv Detail & Related papers (2022-10-25T13:22:41Z) - Deep Fourier Up-Sampling [100.59885545206744]
Up-sampling in the Fourier domain is more challenging as it does not follow such a local property.
We propose a theoretically sound Deep Fourier Up-Sampling (FourierUp) to solve these issues.
arXiv Detail & Related papers (2022-10-11T06:17:31Z) - Fourier Disentangled Space-Time Attention for Aerial Video Recognition [54.80846279175762]
We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent from the background.
We have evaluated our approach on multiple UAV datasets including UAV Human RGB, UAV Human Night, Drone Action, and NEC Drone.
arXiv Detail & Related papers (2022-03-21T01:24:53Z) - Multidomain Multimodal Fusion For Human Action Recognition Using
Inertial Sensors [1.52292571922932]
We propose a novel multidomain multimodal fusion framework that extracts complementary and distinct features from different domains of the input modality.
Features in different domains are extracted by Convolutional Neural networks (CNNs) and then fused by Canonical Correlation based Fusion (CCF) for improving the accuracy of human action recognition.
arXiv Detail & Related papers (2020-08-22T03:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.