Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution
- URL: http://arxiv.org/abs/2403.11078v1
- Date: Sun, 17 Mar 2024 04:08:58 GMT
- Title: Adaptive Semantic-Enhanced Denoising Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution
- Authors: Jialu Sui, Xianping Ma, Xiaokang Zhang, Man-On Pun,
- Abstract summary: Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions.
High-frequency details generated by DDPM often suffer from misalignment with HR images due to model's tendency to overlook long-range semantic contexts.
An adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM.
- Score: 7.252121550658619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Remote sensing image super-resolution (SR) is a crucial task to restore high-resolution (HR) images from low-resolution (LR) observations. Recently, the Denoising Diffusion Probabilistic Model (DDPM) has shown promising performance in image reconstructions by overcoming problems inherent in generative models, such as over-smoothing and mode collapse. However, the high-frequency details generated by DDPM often suffer from misalignment with HR images due to the model's tendency to overlook long-range semantic contexts. This is attributed to the widely used U-Net decoder in the conditional noise predictor, which tends to overemphasize local information, leading to the generation of noises with significant variances during the prediction process. To address these issues, an adaptive semantic-enhanced DDPM (ASDDPM) is proposed to enhance the detail-preserving capability of the DDPM by incorporating low-frequency semantic information provided by the Transformer. Specifically, a novel adaptive diffusion Transformer decoder (ADTD) is developed to bridge the semantic gap between the encoder and decoder through regulating the noise prediction with the global contextual relationships and long-range dependencies in the diffusion process. Additionally, a residual feature fusion strategy establishes information exchange between the two decoders at multiple levels. As a result, the predicted noise generated by our approach closely approximates that of the real noise distribution.Extensive experiments on two SR and two semantic segmentation datasets confirm the superior performance of the proposed ASDDPM in both SR and the subsequent downstream applications. The source code will be available at https://github.com/littlebeen/ASDDPM-Adaptive-Semantic-Enhanced-DDPM.
Related papers
- DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints [27.049330099874396]
This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model.
Our experimental results demonstrate significant improvements in pixel-level metrics like peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS)
arXiv Detail & Related papers (2024-07-26T02:34:25Z) - UDHF2-Net: Uncertainty-diffusion-model-based High-Frequency TransFormer Network for Remotely Sensed Imagery Interpretation [12.24506241611653]
Uncertainty-diffusion-model-based high-Frequency TransFormer network (UDHF2-Net) is the first to be proposed.
UDHF2-Net is a spatially-stationary-and-non-stationary high-frequency connection paradigm (SHCP)
Mask-and-geo-knowledge-based uncertainty diffusion module (MUDM) is a self-supervised learning strategy.
A frequency-wise semi-pseudo-Siamese UDHF2-Net is the first to be proposed to balance accuracy and complexity for change detection.
arXiv Detail & Related papers (2024-06-23T15:03:35Z) - Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images [14.236580915897585]
RSICC aims at generating human-like language to describe semantic changes between bi-temporal remote sensing image pairs.
Inspired by the remarkable generative power of diffusion model, we propose a probabilistic diffusion model for RSICC.
In training process, we construct a noise predictor conditioned on cross modal features to learn the distribution from the real caption distribution to the standard Gaussian distribution under the Markov chain.
In testing phase, the well-trained noise predictor helps to estimate the mean value of the distribution and generate change captions step by step.
arXiv Detail & Related papers (2024-05-21T15:44:31Z) - BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution [52.47005445345593]
BlindDiff is a DM-based blind SR method to tackle the blind degradation settings in SISR.
BlindDiff seamlessly integrates the MAP-based optimization into DMs.
Experiments on both synthetic and real-world datasets show that BlindDiff achieves the state-of-the-art performance.
arXiv Detail & Related papers (2024-03-15T11:21:34Z) - Inference Stage Denoising for Undersampled MRI Reconstruction [13.8086726938161]
Reconstruction of magnetic resonance imaging (MRI) data has been positively affected by deep learning.
A key challenge remains: to improve generalisation to distribution shifts between the training and testing data.
arXiv Detail & Related papers (2024-02-12T12:50:10Z) - SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired
Image-to-Image Translation [96.11061713135385]
This work presents a new score-decomposed diffusion model to explicitly optimize the tangled distributions during image generation.
We equalize the refinement parts of the score function and energy guidance, which permits multi-objective optimization on the manifold.
SDDM outperforms existing SBDM-based methods with much fewer diffusion steps on several I2I benchmarks.
arXiv Detail & Related papers (2023-08-04T06:21:57Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation [56.04628143914542]
Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
arXiv Detail & Related papers (2022-10-10T18:49:25Z) - Learned Image Compression with Generalized Octave Convolution and
Cross-Resolution Parameter Estimation [5.238765582868391]
We propose a learned multi-resolution image compression framework, which exploits octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts.
Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods.
arXiv Detail & Related papers (2022-09-07T08:21:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.