FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation
- URL: http://arxiv.org/abs/2502.03829v1
- Date: Thu, 06 Feb 2025 07:24:34 GMT
- Title: FE-UNet: Frequency Domain Enhanced U-Net with Segment Anything Capability for Versatile Image Segmentation
- Authors: Guohao Huo, Ruiting Dai, Ling Shao, Hao Tang,
- Abstract summary: We experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system.<n>We propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain.<n>To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB)<n>We develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block.
- Score: 50.9040167152168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image segmentation is a critical task in visual understanding. Convolutional Neural Networks (CNNs) are predisposed to capture high-frequency features in images, while Transformers exhibit a contrasting focus on low-frequency features. In this paper, we experimentally quantify the contrast sensitivity function of CNNs and compare it with that of the human visual system, informed by the seminal experiments of Mannos and Sakrison. Leveraging these insights, we propose the Wavelet-Guided Spectral Pooling Module (WSPM) to enhance and balance image features across the frequency domain. To further emulate the human visual system, we introduce the Frequency Domain Enhanced Receptive Field Block (FE-RFB), which integrates WSPM to extract enriched features from the frequency domain. Building on these innovations, we develop FE-UNet, a model that utilizes SAM2 as its backbone and incorporates Hiera-Large as a pre-trained block, designed to enhance generalization capabilities while ensuring high segmentation accuracy. Experimental results demonstrate that FE-UNet achieves state-of-the-art performance in diverse tasks, including marine animal and polyp segmentation, underscoring its versatility and effectiveness.
Related papers
- Self-Bootstrapping for Versatile Test-Time Adaptation [29.616417768209114]
We develop a versatile test-time adaptation (TTA) objective for a variety of tasks.
We achieve this through a self-bootstrapping scheme that optimize prediction consistency between the test image (as target) and its deteriorated view.
Experiments show that, either independently or as a plug-and-play module, our method achieves superior results across classification, segmentation, and 3D monocular detection tasks.
arXiv Detail & Related papers (2025-04-10T05:45:07Z) - Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation [33.49405456617909]
We propose the Adaptive Frequency Enhancement Network (AFENet), which integrates two key components: the Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and the Selective feature Fusion Module (SFM)
AFSIM dynamically separates and modulates high- and low-frequency features according to the content of the input image.
SFM selectively fuses global context and local detailed features to enhance the network's representation capability.
arXiv Detail & Related papers (2025-04-03T14:42:49Z) - Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation [27.576174611043367]
Masked Image Modeling (MIM) has garnered significant attention in self-supervised learning, thanks to its impressive capacity to learn scalable visual representations tailored for downstream tasks.
However, images inherently contain abundant redundant information, leading the pixel-based MIM reconstruction process to focus excessively on finer details such as textures, thus prolonging training times unnecessarily.
In this study, we leverage wavelet transform as a tool for efficient representation learning to expedite the training process of MIM.
arXiv Detail & Related papers (2025-03-02T08:11:26Z) - Frequency-Spatial Entanglement Learning for Camouflaged Object Detection [34.426297468968485]
Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design.
We propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method.
Our experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets.
arXiv Detail & Related papers (2024-09-03T07:58:47Z) - Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods.
Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z) - An Advanced Features Extraction Module for Remote Sensing Image Super-Resolution [0.5461938536945723]
We propose an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE)
Our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones.
Our model achieved superior performance compared to various existing models.
arXiv Detail & Related papers (2024-05-07T18:15:51Z) - Dynamic Spectrum Mixer for Visual Recognition [17.180863898764194]
We propose a content-adaptive yet computationally efficient structure, dubbed Dynamic Spectrum Mixer (DSM)
DSM represents token interactions in the frequency domain by employing the Cosine Transform.
It can learn long-term spatial dependencies with log-linear complexity.
arXiv Detail & Related papers (2023-09-13T04:51:15Z) - Adaptive Frequency Filters As Efficient Global Token Mixers [100.27957692579892]
We show that adaptive frequency filters can serve as efficient global token mixers.
We take AFF token mixers as primary neural operators to build a lightweight neural network, dubbed AFFNet.
arXiv Detail & Related papers (2023-07-26T07:42:28Z) - Contextual Learning in Fourier Complex Field for VHR Remote Sensing
Images [64.84260544255477]
transformer-based models demonstrated outstanding potential for learning high-order contextual relationships from natural images with general resolution (224x224 pixels)
We propose a complex self-attention (CSA) mechanism to model the high-order contextual information with less than half computations of naive SA.
By stacking various layers of CSA blocks, we propose the Fourier Complex Transformer (FCT) model to learn global contextual information from VHR aerial images.
arXiv Detail & Related papers (2022-10-28T08:13:33Z) - Masked Frequency Modeling for Self-Supervised Visual Pre-Training [102.89756957704138]
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
MFM first masks out a portion of frequency components of the input image and then predicts the missing frequencies on the frequency spectrum.
For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model, (iii) mask token.
arXiv Detail & Related papers (2022-06-15T17:58:30Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.