Related papers: DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

URL: http://arxiv.org/abs/2406.16477v1
Date: Mon, 24 Jun 2024 09:30:36 GMT
Title: DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution
Authors: Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang,
Abstract summary: This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. The proposed method achieves a new state-of-the-art perceptual quality level, especially in real-world cases.
Score: 19.33582308829547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given the severe degradation commonly presented in low-resolution images, coupled with the randomness characteristics of diffusion models, current models struggle to adequately discern semantic and degradation information within severely degraded images. This often leads to obstacles such as semantic loss, visual artifacts, and visual hallucinations, which pose substantial challenges for practical use. To address these challenges, this paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. Complementary priors including semantic content descriptions and degradation prompts are explored. Specifically, on one hand, image-restoration prompt alignment decoder is proposed to automatically discern the degradation degree of LR images, thereby generating beneficial degradation priors for image restoration. On the other hand, much richly tailored descriptions from pretrained multimodal large language model elicit high-level semantic priors closely aligned with human perception, ensuring fidelity control for image restoration. Comprehensive comparisons with state-of-the-art methods have been done on several popular synthetic and real-world benchmark datasets. The quantitative and qualitative analysis have demonstrated that the proposed method achieves a new state-of-the-art perceptual quality level, especially in real-world cases based on reference-free metrics.

Related papers

From Enhancement to Understanding: Build a Generalized Bridge for Low-light Vision via Semantically Consistent Unsupervised Fine-tuning [65.94580484237737]
Low-light enhancement improves image quality for downstream tasks, but existing methods rely on physical or geometric priors.<n>We build a generalized bridge between low-light enhancement and low-light understanding, which we term Generalized Enhancement For Understanding (GEFU)<n>To address the diverse causes of low-light degradation, we leverage pretrained generative diffusion models to optimize images, achieving zero-shot generalization performance.
arXiv Detail & Related papers (2025-07-11T07:51:26Z)
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration [29.910376294021052]
We propose textbfDefusion, a novel all-in-one image restoration framework that utilizes visual instruction-guided degradation diffusion.<n>We show that Defusion outperforms state-of-the-art methods across diverse image restoration tasks, including complex and real-world degradations.
arXiv Detail & Related papers (2025-06-20T12:50:42Z)
Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models [33.76031793753807]
We adapt the autoregressive multimodal model Lumina-mGPT into a robust Real-ISR model, namely PURE. PURE Perceives and Understands the input low-quality image, then REstores its high-quality counterpart. Experimental results demonstrate that PURE preserves image content while generating realistic details.
arXiv Detail & Related papers (2025-03-14T04:33:59Z)
Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model. We utilize the pre-trained visual language model to extract visual prompts from degraded images. We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z)
SSP-IR: Semantic and Structure Priors for Diffusion-based Realistic Image Restoration [20.873676111265656]
SSP-IR aims to fully exploit semantic and structure priors from low-quality images. Our method outperforms other state-of-the-art methods overall on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-07-04T04:55:14Z)
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution [14.935662351654601]
Diffusion-based methods, endowed with a formidable generative prior, have received increasing attention in Image Super-Resolution. It is challenging for ISR models to perceive the semantic and degradation information, resulting in restoration images with incorrect content or unrealistic artifacts. We propose a textitCross-modal Priors for Super-Resolution (XPSR) framework to acquire precise and comprehensive semantic conditions for the diffusion model.
arXiv Detail & Related papers (2024-03-08T04:52:22Z)
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS) A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z)
ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images. We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image. The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z)
CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z)
Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images. Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance. We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z)
All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation [47.00239809958627]
We propose a novel All-in-one Multi-degradation Image Restoration Network (AMIRNet) AMIRNet learns a degradation representation for unknown degraded images by progressively constructing a tree structure through clustering. This tree-structured representation explicitly reflects the consistency and discrepancy of various distortions, providing a specific clue for image restoration.
arXiv Detail & Related papers (2023-08-06T04:51:41Z)
Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network. We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z)
Single Image Internal Distribution Measurement Using Non-Local Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE) textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood. Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z)
Invertible Image Rescaling [118.2653765756915]
We develop an Invertible Rescaling Net (IRN) to produce visually-pleasing low-resolution images. We capture the distribution of the lost information using a latent variable following a specified distribution in the downscaling process.
arXiv Detail & Related papers (2020-05-12T09:55:53Z)
Real-world Person Re-Identification via Degradation Invariance Learning [111.86722193694462]
Person re-identification (Re-ID) in real-world scenarios usually suffers from various degradation factors, e.g., low-resolution, weak illumination, blurring and adverse weather. We propose a degradation invariance learning framework for real-world person Re-ID. By introducing a self-supervised disentangled representation learning strategy, our method is able to simultaneously extract identity-related robust features.
arXiv Detail & Related papers (2020-04-10T07:58:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.