UDHF2-Net: An Uncertainty-diffusion-model-based High-Frequency TransFormer Network for High-accuracy Interpretation of Remotely Sensed Imagery
- URL: http://arxiv.org/abs/2406.16129v1
- Date: Sun, 23 Jun 2024 15:03:35 GMT
- Title: UDHF2-Net: An Uncertainty-diffusion-model-based High-Frequency TransFormer Network for High-accuracy Interpretation of Remotely Sensed Imagery
- Authors: Pengfei Zhang, Chang Li, Yongjun Zhang, Rongjun Qin,
- Abstract summary: Uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is proposed for remotely sensed image high-accuracy interpretation (RSIHI)
- Score: 12.24506241611653
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remotely sensed image high-accuracy interpretation (RSIHI), including tasks such as semantic segmentation and change detection, faces the three major problems: (1) complementarity problem of spatially stationary-and-non-stationary frequency; (2) edge uncertainty problem caused by down-sampling in the encoder step and intrinsic edge noises; and (3) false detection problem caused by imagery registration error in change detection. To solve the aforementioned problems, an uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the proposed for RSIHI, the superiority of which is as following: (1) a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP) is proposed to enhance the interaction of spatially stationary and non-stationary frequency features to yield high-fidelity edge extraction result. Inspired by HRFormer, SHCP remains the high-frequency stream through the whole encoder-decoder process with parallel high-to-low frequency streams and reduces the edge loss by a downsampling operation; (2) a mask-and-geo-knowledge-based uncertainty diffusion module (MUDM) is proposed to improve the robustness and edge noise resistance. MUDM could further optimize the uncertain region to improve edge extraction result by gradually removing the multiple geo-knowledge-based noises; (3) a semi-pseudo-Siamese UDHF2-Net for change detection task is proposed to reduce the pseudo change by registration error. It adopts semi-pseudo-Siamese architecture to extract above complemental frequency features for adaptively reducing registration differencing, and MUDM to recover the uncertain region by gradually reducing the registration error besides above edge noises. Comprehensive experiments were performed to demonstrate the superiority of UDHF2-Net. Especially ablation experiments indicate the effectiveness of UDHF2-Net.
Related papers
- Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - SWAN: Synergistic Wavelet-Attention Network for Infrared Small Target Detection [8.098063209250684]
Infrared small target detection (IRSTD) is critical in both civilian and military applications.<n>Recent methods focus on conventional convolution operations, which primarily capture local spatial patterns.<n>We propose the Synergistic Wavelet-Attention Network (SWAN), a novel framework designed to perceive targets from both spatial and frequency domains.
arXiv Detail & Related papers (2025-08-02T11:26:58Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.
We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)
PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.
FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar [8.674241138986925]
The system employs a planar array configuration and adapts its transmitted waveforms and beamforming patterns to optimize detection performance.
A robust Wald-type detector is integrated with a SARSA-based RL algorithm, enabling the radar to learn and adapt to complex clutter environments.
arXiv Detail & Related papers (2025-02-07T14:31:58Z) - ΩSFormer: Dual-Modal Ω-like Super-Resolution Transformer Network for Cross-scale and High-accuracy Terraced Field Vectorization Extraction [14.821191612452418]
Terraced field is a significant engineering practice for soil and water conservation (SWC)
This study is the first to propose a novel dual-modal Omega-like super-resolution Transformer network for intelligent TFVE.
arXiv Detail & Related papers (2024-11-26T04:00:28Z) - M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising [63.39134873744748]
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images.
This paper proposes a novel noise-resistant M3DM-NR framework to leverage strong multi-modal discriminative capabilities of CLIP.
Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
arXiv Detail & Related papers (2024-06-04T12:33:02Z) - Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution [7.252121550658619]
Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions.
High-frequency details generated by DDPM often suffer from misalignment with HR images due to model's tendency to overlook long-range semantic contexts.
An adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM.
arXiv Detail & Related papers (2024-03-17T04:08:58Z) - Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data.
We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising.
Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z) - The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access [0.7655800373514546]
We present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model.
A novel blind normalized Stein variational gradient descent (SVGD)-based detector is proposed to obtain an approximate solution to the MLE model.
The proposed block MHT layer outperforms other transform-based methods in terms of costs and denoising performance.
arXiv Detail & Related papers (2024-03-08T04:08:40Z) - Global Context Aggregation Network for Lightweight Saliency Detection of
Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module.
The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral
Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering.
To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed.
This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - FreqNet: A Frequency-domain Image Super-Resolution Network with Dicrete
Cosine Transform [16.439669339293747]
Single image super-resolution(SISR) is an ill-posed problem that aims to obtain high-resolution (HR) output from low-resolution (LR) input.
Despite the high peak signal-to-noise ratios(PSNR) results, it is difficult to determine whether the model correctly adds desired high-frequency details.
We propose FreqNet, an intuitive pipeline from the frequency domain perspective, to solve this problem.
arXiv Detail & Related papers (2021-11-21T11:49:12Z) - Learning a Model-Driven Variational Network for Deformable Image
Registration [89.9830129923847]
VR-Net is a novel cascaded variational network for unsupervised deformable image registration.
It outperforms state-of-the-art deep learning methods on registration accuracy.
It maintains the fast inference speed of deep learning and the data-efficiency of variational model.
arXiv Detail & Related papers (2021-05-25T21:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.