PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification
- URL: http://arxiv.org/abs/2504.19136v3
- Date: Fri, 17 Oct 2025 12:18:59 GMT
- Title: PAD: Phase-Amplitude Decoupling Fusion for Multi-Modal Land Cover Classification
- Authors: Huiling Zheng, Xian Zhong, Bin Liu, Yi Xiao, Bihan Wen, Xiaofeng Li,
- Abstract summary: Phase-Amplitude Decoupling (PAD) is a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components.<n>This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing.
- Score: 49.37555541088792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fusion of Synthetic Aperture Radar (SAR) and RGB imagery for land cover classification remains challenging due to modality heterogeneity and underexploited spectral complementarity. Existing approaches often fail to decouple shared structural features from modality-complementary radiometric attributes, resulting in feature conflicts and information loss. To address this, we propose Phase-Amplitude Decoupling (PAD), a frequency-aware framework that separates phase (modality-shared) and amplitude (modality-complementary) components in the Fourier domain. This design reinforces shared structures while preserving complementary characteristics, thereby enhancing fusion quality. Unlike previous methods that overlook the distinct physical properties encoded in frequency spectra, PAD explicitly introduces amplitude-phase decoupling for multi-modal fusion. Specifically, PAD comprises two key components: 1) Phase Spectrum Correction (PSC), which aligns cross-modal phase features via convolution-guided scaling to improve geometric consistency; and 2) Amplitude Spectrum Fusion (ASF), which dynamically integrates high- and low-frequency patterns using frequency-adaptive multilayer perceptrons, effectively exploiting SAR's morphological sensitivity and RGB's spectral richness. Extensive experiments on WHU-OPT-SAR and DDHR-SK demonstrate state-of-the-art performance. This work establishes a new paradigm for physics-aware multi-modal fusion in remote sensing. The code will be available at https://github.com/RanFeng2/PAD.
Related papers
- ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling [11.169420448510095]
ThermoSplat is a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling.<n>Experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.
arXiv Detail & Related papers (2026-01-22T12:24:26Z) - PAS-Mamba: Phase-Amplitude-Spatial State Space Model for MRI Reconstruction [12.528008672425173]
We propose a framework that decouples phase and magnitude modeling in the frequency domain and combines it with image-domain features for better reconstruction.<n>Experiments on the IXI and fastMRI knee datasets show that PAS-Mamba consistently outperforms state of the art reconstruction methods.
arXiv Detail & Related papers (2026-01-20T22:53:35Z) - SKANet: A Cognitive Dual-Stream Framework with Adaptive Modality Fusion for Robust Compound GNSS Interference Classification [47.20483076887704]
Global Navigation Satellite Systems (GNSS) face growing threats from sophisticated jamming interference.<n>We propose a cognitive deep learning framework built upon a dual-stream architecture that integrates Time-Frequency Images (TFIs) and Power Spectral Density (PSD)<n>We show that SKANet achieves an overall accuracy of 96.99%, exhibiting superior robustness for compound jamming classification.
arXiv Detail & Related papers (2026-01-19T07:42:45Z) - WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition [61.3530659856013]
We propose a novel decoder architecture, WaveSeg, which jointly optimize feature refinement in spatial and wavelet domains.<n>High-frequency components are first learned from input images as explicit priors to reinforce boundary details.<n>Experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-24T01:41:31Z) - IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection [23.256601188227865]
We propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy.<n>The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features.<n>IRDFusion consistently outperforms existing methods across diverse challenging scenarios.
arXiv Detail & Related papers (2025-09-11T01:22:35Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Efficient Dual-domain Image Dehazing with Haze Prior Perception [17.18810808188725]
Transformer-based models exhibit strong global modeling capabilities in single-image dehazing, but their high computational cost limits real-time applicability.<n>We propose the Dark Channel Guided Frequency-aware Dehazing Network (DGFDNet), a novel dual-domain framework that performs physically guided degradation alignment.<n>Experiments on four benchmark haze datasets demonstrate that DGFDNet achieves state-of-the-art performance with superior robustness and real-time efficiency.
arXiv Detail & Related papers (2025-07-15T06:56:56Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images [64.80875911446937]
We propose a Correlation and Continuity Network (CCNet) for HSI reconstruction from RGB images.<n>For the correlation of local spectrum, we introduce the Group-wise Spectral Correlation Modeling (GrSCM) module.<n>For the continuity of global spectrum, we design the Neighborhood-wise Spectral Continuity Modeling (NeSCM) module.
arXiv Detail & Related papers (2025-01-02T15:14:40Z) - Deep Fourier-embedded Network for RGB and Thermal Salient Object Detection [8.607385112274882]
Deep learning has significantly improved salient object detection (SOD) combining both RGB and thermal (RGB-T) images.<n>Existing deep learning-based RGB-T SOD models suffer from two major limitations.<n>We propose a purely Fourier transform-based model, namely Deep Fourier-Embedded Network (DFENet) for accurate RGB-T SOD.
arXiv Detail & Related papers (2024-11-27T14:55:16Z) - A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening [14.293042131263924]
We introduce a spatial-spectral integrated diffusion model for the remote sensing pansharpening task, called SSDiff.
SSDiff considers the pansharpening process as the fusion process of spatial and spectral components from the perspective of subspace decomposition.
arXiv Detail & Related papers (2024-04-17T16:30:56Z) - A Dual Domain Multi-exposure Image Fusion Network based on the
Spatial-Frequency Integration [57.14745782076976]
Multi-exposure image fusion aims to generate a single high-dynamic image by integrating images with different exposures.
We propose a novelty perspective on multi-exposure image fusion via the Spatial-Frequency Integration Framework, named MEF-SFI.
Our method achieves visual-appealing fusion results against state-of-the-art multi-exposure image fusion approaches.
arXiv Detail & Related papers (2023-12-17T04:45:15Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness.
Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation.
This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery [0.6853165736531939]
Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms.
We propose a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions.
Our proposed approach can achieve the state-of-the-art performance at a low cost.
arXiv Detail & Related papers (2021-12-06T13:12:36Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - MESSFN : a Multi-level and Enhanced Spectral-Spatial Fusion Network for
Pan-sharpening [17.129956512200454]
We propose a Multi-level and Enhanced Spectral-Spatial Fusion Network (MESSFN) with the following innovations.
A novel Spectral-Spatial stream is established to hierarchically derive and fuse the multi-level prior spectral and spatial expertise from the MS stream and the PAN stream.
Experiments on two datasets demonstrate that the network is competitive with or better than state-of-the-art methods.
arXiv Detail & Related papers (2021-09-21T03:38:52Z) - Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction [75.35200719645283]
We propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components.
By reformulating the complex operations using octave convolutions, our model shows a strong ability to capture richer representations of MR images.
arXiv Detail & Related papers (2021-04-12T10:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.