Related papers: UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation

UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation

URL: http://arxiv.org/abs/2406.16129v2
Date: Thu, 31 Oct 2024 15:46:45 GMT
Title: UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation
Authors: Pengfei Zhang, Chang Li, Yongjun Zhang, Rongjun Qin,
Abstract summary: Uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed. UDHF2-Net is a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP) Mask-and-geo-knowledge-based uncertainty diffusion module (MUDM) is a self-supervised learning strategy. A frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection.
Score: 12.24506241611653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remotely sensed imagery interpretation (RSII) faces the three major problems: (1) objective representation of spatial distribution patterns; (2) edge uncertainty problem caused by downsampling encoder and intrinsic edge noises (e.g., mixed pixel and edge occlusion etc.); and (3) false detection problem caused by geometric registration error in change detection. To solve the aforementioned problems, uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed, whose superiorities are as follows: (1) a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP) is proposed to enhance the interaction of spatially frequency-wise stationary and non-stationary features to yield high-fidelity edge extraction result. Inspired by HRFormer, SHCP proposes high-frequency-wise stream to replace high-resolution-wise stream in HRFormer through the whole encoder-decoder process with parallel frequency-wise high-to-low streams, so it improves the edge extraction accuracy by continuously remaining high-frequency information; (2) a mask-and-geo-knowledge-based uncertainty diffusion module (MUDM), which is a self-supervised learning strategy, is proposed to improve the edge accuracy of extraction and change detection by gradually removing the simulated spectrum noises based on geo-knowledge and the generated diffused spectrum noises; (3) a frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection. Besides the aforementioned spectrum noises in semantic segmentation, MUDM is also a self-supervised learning strategy to effectively reduce the edge false change detection from the generated imagery with geometric registration error.

Related papers

FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability. We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF) PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models. FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z)
Towards Smarter Sensing: 2D Clutter Mitigation in RL-Driven Cognitive MIMO Radar [8.674241138986925]
The system employs a planar array configuration and adapts its transmitted waveforms and beamforming patterns to optimize detection performance. A robust Wald-type detector is integrated with a SARSA-based RL algorithm, enabling the radar to learn and adapt to complex clutter environments.
arXiv Detail & Related papers (2025-02-07T14:31:58Z)
ΩSFormer: Dual-Modal Ω-like Super-Resolution Transformer Network for Cross-scale and High-accuracy Terraced Field Vectorization Extraction [14.821191612452418]
Terraced field is a significant engineering practice for soil and water conservation (SWC) This study is the first to propose a novel dual-modal Omega-like super-resolution Transformer network for intelligent TFVE.
arXiv Detail & Related papers (2024-11-26T04:00:28Z)
M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising [63.39134873744748]
Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. This paper proposes a novel noise-resistant M3DM-NR framework to leverage strong multi-modal discriminative capabilities of CLIP. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
arXiv Detail & Related papers (2024-06-04T12:33:02Z)
Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution [7.252121550658619]
Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions. High-frequency details generated by DDPM often suffer from misalignment with HR images due to model's tendency to overlook long-range semantic contexts. An adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM.
arXiv Detail & Related papers (2024-03-17T04:08:58Z)
Hybrid Convolutional and Attention Network for Hyperspectral Image Denoising [54.110544509099526]
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. We propose a hybrid convolution and attention network (HCANet) to enhance HSI denoising. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet.
arXiv Detail & Related papers (2024-03-15T07:18:43Z)
The Blind Normalized Stein Variational Gradient Descent-Based Detection for Intelligent Massive Random Access [0.7655800373514546]
We present a novel early preamble detection scheme based on a maximum likelihood estimation (MLE) model. A novel blind normalized Stein variational gradient descent (SVGD)-based detector is proposed to obtain an approximate solution to the MLE model. The proposed block MHT layer outperforms other transform-based methods in terms of costs and denoising performance.
arXiv Detail & Related papers (2024-03-08T04:08:40Z)
Global Context Aggregation Network for Lightweight Saliency Detection of Surface Defects [70.48554424894728]
We develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure. First, we introduce a novel transformer encoder on the top layer of the lightweight backbone, which captures global context information through a novel Depth-wise Self-Attention (DSA) module. The experimental results on three public defect datasets demonstrate that the proposed network achieves a better trade-off between accuracy and running efficiency compared with other 17 state-of-the-art methods.
arXiv Detail & Related papers (2023-09-22T06:19:11Z)
Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain. Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage. Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z)
Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed. This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z)
The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss. Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU. The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z)
FreqNet: A Frequency-domain Image Super-Resolution Network with Dicrete Cosine Transform [16.439669339293747]
Single image super-resolution(SISR) is an ill-posed problem that aims to obtain high-resolution (HR) output from low-resolution (LR) input. Despite the high peak signal-to-noise ratios(PSNR) results, it is difficult to determine whether the model correctly adds desired high-frequency details. We propose FreqNet, an intuitive pipeline from the frequency domain perspective, to solve this problem.
arXiv Detail & Related papers (2021-11-21T11:49:12Z)
Learning a Model-Driven Variational Network for Deformable Image Registration [89.9830129923847]
VR-Net is a novel cascaded variational network for unsupervised deformable image registration. It outperforms state-of-the-art deep learning methods on registration accuracy. It maintains the fast inference speed of deep learning and the data-efficiency of variational model.
arXiv Detail & Related papers (2021-05-25T21:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.