SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection
- URL: http://arxiv.org/abs/2511.06298v2
- Date: Thu, 13 Nov 2025 01:17:31 GMT
- Title: SFFR: Spatial-Frequency Feature Reconstruction for Multispectral Aerial Object Detection
- Authors: Xin Zuo, Yuchen Qu, Haibo Zhan, Jifeng Shen, Wankou Yang,
- Abstract summary: We propose a novel Spatial and Frequency Feature Reconstruction method (SFFR) method.<n>It reconstructs complementary representations in both spatial and frequency domains prior to feature fusion.<n>It is experimentally validated that our proposed FCEKAN and MSGKAN modules are complementary and can effectively capture the frequency and spatial semantic features respectively.
- Score: 12.521255528136278
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent multispectral object detection methods have primarily focused on spatial-domain feature fusion based on CNNs or Transformers, while the potential of frequency-domain feature remains underexplored. In this work, we propose a novel Spatial and Frequency Feature Reconstruction method (SFFR) method, which leverages the spatial-frequency feature representation mechanisms of the Kolmogorov-Arnold Network (KAN) to reconstruct complementary representations in both spatial and frequency domains prior to feature fusion. The core components of SFFR are the proposed Frequency Component Exchange KAN (FCEKAN) module and Multi-Scale Gaussian KAN (MSGKAN) module. The FCEKAN introduces an innovative selective frequency component exchange strategy that effectively enhances the complementarity and consistency of cross-modal features based on the frequency feature of RGB and IR images. The MSGKAN module demonstrates excellent nonlinear feature modeling capability in the spatial domain. By leveraging multi-scale Gaussian basis functions, it effectively captures the feature variations caused by scale changes at different UAV flight altitudes, significantly enhancing the model's adaptability and robustness to scale variations. It is experimentally validated that our proposed FCEKAN and MSGKAN modules are complementary and can effectively capture the frequency and spatial semantic features respectively for better feature fusion. Extensive experiments on the SeaDroneSee, DroneVehicle and DVTOD datasets demonstrate the superior performance and significant advantages of the proposed method in UAV multispectral object perception task. Code will be available at https://github.com/qchenyu1027/SFFR.
Related papers
- FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection [32.27664742588076]
We propose a frequency domain fusion transformer called FreDFT, for visible-infrared object detection.<n>The proposed approach employs a novel multimodal frequency attention (MFDA) to mine complementary information between modalities and a frequency feed-forward layer.<n>Our proposed FreDFT achieves excellent performance on multiple public datasets compared with other state-of-the-art methods.
arXiv Detail & Related papers (2025-11-13T07:46:18Z) - A Spatial-Spectral-Frequency Interactive Network for Multimodal Remote Sensing Classification [45.80836671298513]
This paper introduces the spatial-spectral-frequency interaction network (S$2$Fin), which integrates pairwise fusion modules across the spatial, spectral, and frequency domains.<n> Experiments on four benchmark multimodal datasets with limited labeled data demonstrate that S$2$Fin performs superior classification, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-10-06T09:33:35Z) - A Cross-Hierarchical Multi-Feature Fusion Network Based on Multiscale Encoder-Decoder for Hyperspectral Change Detection [3.5421087596321352]
This paper proposes a cross-hierarchical multi-feature fusion network (CHMFFN) based on a multiscale encoder-decoder architecture.<n> Experiments on four public hyperspectral datasets show CHMFFN outperforms state-of-the-art methods, verifying its effectiveness.
arXiv Detail & Related papers (2025-09-21T09:04:28Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification [49.37555541088792]
Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
arXiv Detail & Related papers (2025-04-27T07:21:42Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - Frequency Domain Enhanced U-Net for Low-Frequency Information-Rich Image Segmentation in Surgical and Deep-Sea Exploration Robots [34.28684917337352]
We address the differences in frequency band sensitivity between CNNs and the human visual system.<n>We propose a wavelet adaptive spectrum fusion (WASF) method inspired by biological vision mechanisms to balance cross-frequency image features.<n>We develop the FE-UNet model, which employs a SAM2 backbone network and incorporates fine-tuned Hiera-Large modules to ensure segmentation accuracy.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Deep Frequency Filtering for Domain Generalization [55.66498461438285]
Deep Neural Networks (DNNs) have preferences for some frequency components in the learning process.
We propose Deep Frequency Filtering (DFF) for learning domain-generalizable features.
We show that applying our proposed DFF on a plain baseline outperforms the state-of-the-art methods on different domain generalization tasks.
arXiv Detail & Related papers (2022-03-23T05:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.