Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising
- URL: http://arxiv.org/abs/2502.06432v2
- Date: Thu, 13 Mar 2025 12:49:20 GMT
- Title: Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single-Image Denoising
- Authors: Huaqiu Li, Wang Zhang, Xiaowan Hu, Tao Jiang, Zikang Chen, Haoqian Wang,
- Abstract summary: We introduce Prompt-SID, a prompt-learning-based single image denoising framework.<n>It captures original-scale image information through structural encoding and integrates this prompt into the denoiser.<n>We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID.
- Score: 23.07977702905715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many studies have concentrated on constructing supervised models utilizing paired datasets for image denoising, which proves to be expensive and time-consuming. Current self-supervised and unsupervised approaches typically rely on blind-spot networks or sub-image pairs sampling, resulting in pixel information loss and destruction of detailed structural information, thereby significantly constraining the efficacy of such methods. In this paper, we introduce Prompt-SID, a prompt-learning-based single image denoising framework that emphasizes preserving of structural details. This approach is trained in a self-supervised manner using downsampled image pairs. It captures original-scale image information through structural encoding and integrates this prompt into the denoiser. To achieve this, we propose a structural representation generation model based on the latent diffusion process and design a structural attention module within the transformer-based denoiser architecture to decode the prompt. Additionally, we introduce a scale replay training mechanism, which effectively mitigates the scale gap from images of different resolutions. We conduct comprehensive experiments on synthetic, real-world, and fluorescence imaging datasets, showcasing the remarkable effectiveness of Prompt-SID. Our code will be released at https://github.com/huaqlili/Prompt-SID.
Related papers
- Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models [33.76031793753807]
We adapt the autoregressive multimodal model Lumina-mGPT into a robust Real-ISR model, namely PURE.
PURE Perceives and Understands the input low-quality image, then REstores its high-quality counterpart.
Experimental results demonstrate that PURE preserves image content while generating realistic details.
arXiv Detail & Related papers (2025-03-14T04:33:59Z) - "Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.
Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z) - Hierarchical Information Flow for Generalized Efficient Image Restoration [108.83750852785582]
We propose a hierarchical information flow mechanism for image restoration, dubbed Hi-IR.<n>Hi-IR constructs a hierarchical information tree representing the degraded image across three levels.<n>In seven common image restoration tasks, Hi-IR achieves its effectiveness and generalizability.
arXiv Detail & Related papers (2024-11-27T18:30:08Z) - MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration [17.47612023350466]
We propose MRIR, a diffusion-based restoration method with multimodal insights.
For the textual level, we harness the power of the pre-trained multimodal large language model to infer meaningful semantic information from low-quality images.
For the visual level, we mainly focus on the pixel level control. Thus, we utilize a Pixel-level Processor and ControlNet to control spatial structures.
arXiv Detail & Related papers (2024-07-04T04:55:14Z) - Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [58.48890547818074]
We present a powerful modification of Contrastive Denoising Score (CUT) for latent diffusion models (LDM)
Our approach enables zero-shot imageto-image translation and neural field (NeRF) editing, achieving structural correspondence between the input and output.
arXiv Detail & Related papers (2023-11-30T15:06:10Z) - Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image
Denoising [16.43285056788183]
We propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG)
Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal.
It employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality.
arXiv Detail & Related papers (2023-09-19T16:01:20Z) - Improving Few-shot Image Generation by Structural Discrimination and
Textural Modulation [10.389698647141296]
Few-shot image generation aims to produce plausible and diverse images for one category given a few images from this category.
Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients.
This paper proposes a novel mechanism to inject external semantic signals into internal local representations.
arXiv Detail & Related papers (2023-08-30T16:10:21Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.<n>This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.<n>We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network [52.77569396659629]
This paper presents the deep compensation network unfolding (DCUNet) for restoring light field (LF) images captured under low-light conditions.
The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result.
To properly leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module.
arXiv Detail & Related papers (2023-08-10T07:53:06Z) - Learning Detail-Structure Alternative Optimization for Blind
Super-Resolution [69.11604249813304]
We propose an effective and kernel-free network, namely DSSR, which enables recurrent detail-structure alternative optimization without blur kernel prior incorporation for blind SR.
In our DSSR, a detail-structure modulation module (DSMM) is built to exploit the interaction and collaboration of image details and structures.
Our method achieves the state-of-the-art against existing methods.
arXiv Detail & Related papers (2022-12-03T14:44:17Z) - Unsupervised Structure-Consistent Image-to-Image Translation [6.282068591820945]
The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation.
We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.
The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code.
arXiv Detail & Related papers (2022-08-24T13:47:15Z) - Real Image Restoration via Structure-preserving Complementarity
Attention [10.200625895876023]
We propose a novel lightweight Complementary Attention Module, which includes a density module and a sparse module.
To reduce the loss of details caused by denoising, this paper constructs a gradient-based structure-preserving branch.
arXiv Detail & Related papers (2022-07-28T04:24:20Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - SUMD: Super U-shaped Matrix Decomposition Convolutional neural network
for Image denoising [0.0]
We introduce the matrix decomposition module(MD) in the network to establish the global context feature.
Inspired by the design of multi-stage progressive restoration of U-shaped architecture, we further integrate the MD module into the multi-branches.
Our model(SUMD) can produce comparable visual quality and accuracy results with Transformer-based methods.
arXiv Detail & Related papers (2022-04-11T04:38:34Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Defending Adversarial Examples via DNN Bottleneck Reinforcement [20.08619981108837]
This paper presents a reinforcement scheme to alleviate the vulnerability of Deep Neural Networks (DNN) against adversarial attacks.
By reinforcing the former while maintaining the latter, any redundant information, be it adversarial or not, should be removed from the latent representation.
In order to reinforce the information bottleneck, we introduce the multi-scale low-pass objective and multi-scale high-frequency communication for better frequency steering in the network.
arXiv Detail & Related papers (2020-08-12T11:02:01Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.