Frequency-Adaptive Dilated Convolution for Semantic Segmentation
- URL: http://arxiv.org/abs/2403.05369v6
- Date: Tue, 21 May 2024 14:29:31 GMT
- Title: Frequency-Adaptive Dilated Convolution for Semantic Segmentation
- Authors: Linwei Chen, Lin Gu, Ying Fu,
- Abstract summary: We propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis.
We introduce Frequency-Adaptive Dilated Convolution (FADC), which adjusts dilation rates spatially based on local frequency components.
We design two plug-in modules to directly enhance effective bandwidth and receptive field size.
- Score: 14.066404173580864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dilated convolution, which expands the receptive field by inserting gaps between its consecutive elements, is widely employed in computer vision. In this study, we propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis. Departing from the conventional practice of fixing a global dilation rate as a hyperparameter, we introduce Frequency-Adaptive Dilated Convolution (FADC), which dynamically adjusts dilation rates spatially based on local frequency components. Subsequently, we design two plug-in modules to directly enhance effective bandwidth and receptive field size. The Adaptive Kernel (AdaKern) module decomposes convolution weights into low-frequency and high-frequency components, dynamically adjusting the ratio between these components on a per-channel basis. By increasing the high-frequency part of convolution weights, AdaKern captures more high-frequency components, thereby improving effective bandwidth. The Frequency Selection (FreqSelect) module optimally balances high- and low-frequency components in feature representations through spatially variant reweighting. It suppresses high frequencies in the background to encourage FADC to learn a larger dilation, thereby increasing the receptive field for an expanded scope. Extensive experiments on segmentation and object detection consistently validate the efficacy of our approach. The code is publicly available at https://github.com/Linwei-Chen/FADC.
Related papers
- 3D Wavelet Convolutions with Extended Receptive Fields for Hyperspectral Image Classification [12.168520751389622]
Deep neural networks face numerous challenges in hyperspectral image classification.
This paper proposes WCNet, an improved 3D-DenseNet model integrated with wavelet transforms.
Experimental results demonstrate superior performance on the IN, UP, and KSC datasets.
arXiv Detail & Related papers (2025-04-15T01:39:42Z) - Frequency Dynamic Convolution for Dense Image Prediction [34.915070244005854]
We introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates limitations by learning a fixed parameter budget in the Fourier domain.
FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost.
We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters.
arXiv Detail & Related papers (2025-03-24T15:32:06Z) - LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning [47.77830360814755]
Location-aware Cosine Adaptation (LoCA) is a novel frequency-domain parameter-efficient fine-tuning method based on Discrete inverse Cosine Transform (iDCT)
Our analysis reveals that frequency-domain decomposition with carefully selected frequency components can surpass the expressivity of traditional low-rank-based methods.
Experiments on diverse language and vision fine-tuning tasks demonstrate that LoCA offers enhanced parameter efficiency while maintains computational feasibility comparable to low-rank-based methods.
arXiv Detail & Related papers (2025-02-05T04:14:34Z) - Frequency-Adaptive Pan-Sharpening with Mixture of Experts [22.28680499480492]
We propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening.
Our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes.
arXiv Detail & Related papers (2024-01-04T08:58:25Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - TWR-MCAE: A Data Augmentation Method for Through-the-Wall Radar Human
Motion Recognition [19.7631142728486]
We propose a multilink auto-encoding neural network (TWR-MCAE) data augmentation method.
The proposed algorithm gets a better peak signal-to-noise ratio (PSNR)
Experiments show that the proposed algorithm gets a better peak signal-to-noise ratio (PSNR)
arXiv Detail & Related papers (2023-01-06T12:56:53Z) - Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time.
This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z) - Multi-Frequency Information Enhanced Channel Attention Module for
Speaker Representation Learning [41.44950556040058]
We propose to utilize multi-frequency information and design two novel and effective attention modules.
The proposed attention modules can effectively capture more speaker information from multiple frequency components on the basis of DCT.
Experimental results demonstrate that our proposed SFSC and MFSC attention modules can efficiently generate more discriminative speaker representations.
arXiv Detail & Related papers (2022-07-10T21:19:36Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z) - Dual-branch Attention-In-Attention Transformer for single-channel speech
enhancement [6.894606865794746]
We propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel.
Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional networks for temporal sequence modeling.
arXiv Detail & Related papers (2021-10-13T03:03:49Z) - Speaker Representation Learning using Global Context Guided Channel and
Time-Frequency Transformations [67.18006078950337]
We use the global context information to enhance important channels and recalibrate salient time-frequency locations.
The proposed modules, together with a popular ResNet based model, are evaluated on the VoxCeleb1 dataset.
arXiv Detail & Related papers (2020-09-02T01:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.