Frequency Dynamic Convolution for Dense Image Prediction
- URL: http://arxiv.org/abs/2503.18783v2
- Date: Tue, 25 Mar 2025 03:09:17 GMT
- Title: Frequency Dynamic Convolution for Dense Image Prediction
- Authors: Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu,
- Abstract summary: We introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates limitations by learning a fixed parameter budget in the Fourier domain.<n>FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost.<n>We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters.
- Score: 34.915070244005854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Dynamic Convolution (DY-Conv) has shown promising performance by enabling adaptive weight selection through multiple parallel weights combined with an attention mechanism, the frequency response of these weights tends to exhibit high similarity, resulting in high parameter costs but limited adaptability. In this work, we introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates these limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. To further enhance adaptability, we propose Kernel Spatial Modulation (KSM) and Frequency Band Modulation (FBM). KSM dynamically adjusts the frequency response of each filter at the spatial level, while FBM decomposes weights into distinct frequency bands in the frequency domain and modulates them dynamically based on local content. Extensive experiments on object detection, segmentation, and classification validate the effectiveness of FDConv. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters, outperforming previous methods that require substantial increases in parameter budgets (e.g., CondConv +90M, KW +76.5M). Moreover, FDConv seamlessly integrates into a variety of architectures, including ConvNeXt, Swin-Transformer, offering a flexible and efficient solution for modern vision tasks. The code is made publicly available at https://github.com/Linwei-Chen/FDConv.
Related papers
- Cross-Frequency Implicit Neural Representation with Self-Evolving Parameters [52.574661274784916]
Implicit neural representation (INR) has emerged as a powerful paradigm for visual data representation.
We propose a self-evolving cross-frequency INR using the Haar wavelet transform (termed CF-INR), which decouples data into four frequency components and employs INRs in the wavelet space.
We evaluate CF-INR on a variety of visual data representation and recovery tasks, including image regression, inpainting, denoising, and cloud removal.
arXiv Detail & Related papers (2025-04-15T07:14:35Z) - FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off [12.900580256269155]
We propose Fast Multi-Attention Dynamic Convolution (FMDConv), which integrates input attention, temperature-degraded kernel attention, and output attention to optimize the speed-accuracy trade-off.
Experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that FMDConv reduces the computational cost by up to 49.8% on ResNet-18 and 42.2% on ResNet-50.
arXiv Detail & Related papers (2025-03-21T20:23:32Z) - Multi-frequency wavefield solutions for variable velocity models using meta-learning enhanced low-rank physics-informed neural network [3.069335774032178]
Physics-informed neural networks (PINNs) face significant challenges in modeling multi-frequency wavefields in complex velocity models.
We propose Meta-LRPINN, a novel framework that combines low-rank parameterization with meta-learning and frequency embedding.
Numerical experiments show that Meta-LRPINN achieves much fast convergence speed and much high accuracy compared to baseline methods.
arXiv Detail & Related papers (2025-02-02T20:12:39Z) - FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition [9.963966059349731]
FreqMixForemrV2 is built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions.
The proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters.
arXiv Detail & Related papers (2024-12-29T23:52:40Z) - State-Free Inference of State-Space Models: The Transfer Function Approach [132.83348321603205]
State-free inference does not incur any significant memory or computational cost with an increase in state size.
We achieve this using properties of the proposed frequency domain transfer function parametrization.
We report improved perplexity in language modeling over a long convolutional Hyena baseline.
arXiv Detail & Related papers (2024-05-10T00:06:02Z) - Frequency-Adaptive Dilated Convolution for Semantic Segmentation [14.066404173580864]
We propose three strategies to improve individual phases of dilated convolution from the view of spectrum analysis.
We introduce Frequency-Adaptive Dilated Convolution (FADC), which adjusts dilation rates spatially based on local frequency components.
We design two plug-in modules to directly enhance effective bandwidth and receptive field size.
arXiv Detail & Related papers (2024-03-08T15:00:44Z) - Frame Flexible Network [52.623337134518835]
Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers.
If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly.
We propose a general framework, named Frame Flexible Network (FFN), which enables the model to be evaluated at different frames to adjust its computation.
arXiv Detail & Related papers (2023-03-26T20:51:35Z) - FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization [73.41395947275473]
We propose a novel frequency-aware architecture, in which the domain-specific features are filtered out in the transformed frequency domain.
Experiments on three benchmarks demonstrate significant performance, outperforming the state-of-the-art methods by a margin of 3%, 4% and 9%, respectively.
arXiv Detail & Related papers (2022-03-24T07:26:29Z) - Mixed Variable Bayesian Optimization with Frequency Modulated Kernels [96.78099706164747]
We propose the frequency modulated (FM) kernel flexibly modeling dependencies among different types of variables.
BO-FM outperforms competitors including Regularized evolution(RE) and BOHB.
arXiv Detail & Related papers (2021-02-25T11:28:46Z) - Dynamic Region-Aware Convolution [85.20099799084026]
We propose a new convolution called Dynamic Region-Aware Convolution (DRConv), which can automatically assign multiple filters to corresponding spatial regions.
On ImageNet classification, DRConv-based ShuffleNetV2-0.5x achieves state-of-the-art performance of 67.1% at 46M multiply-adds level with 6.3% relative improvement.
arXiv Detail & Related papers (2020-03-27T05:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.