Related papers: ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

URL: http://arxiv.org/abs/2410.13807v1
Date: Thu, 17 Oct 2024 17:41:52 GMT
Title: ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution
Authors: Junhao Gu, Peng-Tao Jiang, Hao Zhang, Mi Zhou, Jinwei Chen, Wenming Yang, Bo Li,
Abstract summary: Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations. We introduce ConsisSR to handle both semantic and pixel-level consistency.
Score: 28.945663118445037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations. In particular, pretrained text-to-image (T2I) diffusion models provide strong generative priors to reconstruct credible and intricate details. However, T2I generation focuses on semantic consistency while Real-ISR emphasizes pixel-level reconstruction, which hinders existing methods from fully exploiting diffusion priors. To address this challenge, we introduce ConsisSR to handle both semantic and pixel-level consistency. Specifically, compared to coarse-grained text prompts, we exploit the more powerful CLIP image embedding and effectively leverage both modalities through our Hybrid Prompt Adapter (HPA) for semantic guidance. Secondly, we introduce Time-aware Latent Augmentation (TALA) to mitigate the inherent gap between T2I generation and Real-ISR consistency requirements. By randomly mixing LQ and HQ latent inputs, our model not only handle timestep-specific diffusion noise but also refine the accumulated latent representations. Last but not least, our GAN-Embedding strategy employs the pretrained Real-ESRGAN model to refine the diffusion start point. This accelerates the inference process to 10 steps while preserving sampling quality, in a training-free manner.Our method demonstrates state-of-the-art performance among both full-scale and accelerated models. The code will be made publicly available.

Related papers

One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z)
PromptLA: Towards Integrity Verification of Black-box Text-to-Image Diffusion Models [17.12906933388337]
Malicious actors can fine-tune text-to-image (T2I) diffusion models to generate illegal content. We propose a novel prompt selection algorithm based on learning automaton (PromptLA) for efficient and accurate verification.
arXiv Detail & Related papers (2024-12-20T07:24:32Z)
Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images [7.920423405957888]
E$2$DiffSR achieves superior objective metrics and visual quality compared to the state-of-the-art SR methods. It reduces the inference time of diffusion-based SR methods to a level comparable to that of non-diffusion methods.
arXiv Detail & Related papers (2024-10-30T09:14:13Z)
One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling. OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder. It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z)
One-Step Effective Diffusion Network for Real-World Image Super-Resolution [11.326598938246558]
We propose a one-step effective diffusion network, namely OSEDiff, for the Real-ISR problem. We finetune the pre-trained diffusion network with trainable layers to adapt it to complex image degradations. Our OSEDiff model can efficiently and effectively generate HQ images in just one diffusion step.
arXiv Detail & Related papers (2024-06-12T13:10:31Z)
Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs) Existing binarization methods result in significant performance degradation. We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z)
LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding. We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions. LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z)
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution [14.935662351654601]
Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution. It is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. We propose a textitCross-modal Priors for Super-Resolution (XPSR) framework to acquire precise and comprehensive semantic conditions for the diffusion model.
arXiv Detail & Related papers (2024-03-08T04:52:22Z)
Improving Diffusion-Based Image Synthesis with Context Prediction [49.186366441954846]
Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. We propose ConPreDiff to improve diffusion-based image synthesis with context prediction. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21.
arXiv Detail & Related papers (2024-01-04T01:10:56Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
Iterative Token Evaluation and Refinement for Real-World Super-Resolution [77.74289677520508]
Real-world image super-resolution (RWSR) is a long-standing problem as low-quality (LQ) images often have complex and unidentified degradations. We propose an Iterative Token Evaluation and Refinement framework for RWSR. We show that ITER is easier to train than Generative Adversarial Networks (GANs) and more efficient than continuous diffusion models.
arXiv Detail & Related papers (2023-12-09T17:07:32Z)
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution [16.815468458589635]
We present a semantics-aware approach to better preserve the semantic fidelity of generative real-world image super-resolution. First, we train a degradation-aware prompt extractor, which can generate accurate soft and hard semantic prompts even under strong degradation. The experiments show that our method can reproduce more realistic image details and hold better the semantics.
arXiv Detail & Related papers (2023-11-27T18:11:19Z)
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation [74.5598315066249]
We probe into zero-shot grounded T2I generation with diffusion models. We propose a Region and Boundary (R&B) aware cross-attention guidance approach.
arXiv Detail & Related papers (2023-10-13T05:48:42Z)
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process. We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts. In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z)
DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image. In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR. Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.