Image Super-Resolution with Text Prompt Diffusion
- URL: http://arxiv.org/abs/2311.14282v2
- Date: Tue, 12 Mar 2024 12:14:51 GMT
- Title: Image Super-Resolution with Text Prompt Diffusion
- Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai
Chen, Xiaokang Yang
- Abstract summary: We introduce text prompts to image SR to provide degradation priors.
PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration.
Experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images.
- Score: 123.94190649199449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image super-resolution (SR) methods typically model degradation to improve
reconstruction accuracy in complex and unknown degradation scenarios. However,
extracting degradation information from low-resolution images is challenging,
which limits the model performance. To boost image SR performance, one feasible
approach is to introduce additional priors. Inspired by advancements in
multi-modal methods and text prompt image processing, we introduce text prompts
to image SR to provide degradation priors. Specifically, we first design a
text-image generation pipeline to integrate text into the SR dataset through
the text degradation representation and degradation model. The text
representation applies a discretization manner based on the binning method to
describe the degradation abstractly. This method maintains the flexibility of
the text and is user-friendly. Meanwhile, we propose the PromptSR to realize
the text prompt SR. The PromptSR utilizes the pre-trained language model (e.g.,
T5 or CLIP) to enhance restoration. We train the model on the generated
text-image dataset. Extensive experiments indicate that introducing text
prompts into SR, yields excellent results on both synthetic and real-world
images. Code is available at: https://github.com/zhengchen1999/PromptSR.
Related papers
- Improving Image Restoration through Removing Degradations in Textual
Representations [60.79045963573341]
We introduce a new perspective for improving image restoration by removing degradation in the textual representations of a degraded image.
To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations.
In particular, We ingeniously embed an image-to-text mapper and text restoration module into CLIP-equipped text-to-image models to generate the guidance.
arXiv Detail & Related papers (2023-12-28T19:18:17Z) - CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - Scene Text Image Super-resolution based on Text-conditional Diffusion
Models [0.0]
Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition.
In this study, we leverage text-conditional diffusion models (DMs) for STISR tasks.
We propose a novel framework for LR-HR paired text image datasets.
arXiv Detail & Related papers (2023-11-16T10:32:18Z) - Efficient Test-Time Adaptation for Super-Resolution with Second-Order
Degradation and Reconstruction [62.955327005837475]
Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images.
We present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types.
arXiv Detail & Related papers (2023-10-29T13:58:57Z) - Sentence-level Prompts Benefit Composed Image Retrieval [69.78119883060006]
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption.
We propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts.
Our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
arXiv Detail & Related papers (2023-10-09T07:31:44Z) - Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z) - Text Prior Guided Scene Text Image Super-resolution [11.396781380648756]
Scene text image super-resolution (STISR) aims to improve the resolution and visual quality of low-resolution (LR) scene text images.
We make an attempt to embed categorical text prior into STISR model training.
We present a multi-stage text prior guided super-resolution framework for STISR.
arXiv Detail & Related papers (2021-06-29T12:52:33Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.