Rethinking Super-Resolution as Text-Guided Details Generation
- URL: http://arxiv.org/abs/2207.06604v1
- Date: Thu, 14 Jul 2022 01:46:38 GMT
- Title: Rethinking Super-Resolution as Text-Guided Details Generation
- Authors: Chenxi Ma, Bo Yan, Qing Lin, Weimin Tan, Siming Chen
- Abstract summary: We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
- Score: 21.695227836312835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have greatly promoted the performance of single image
super-resolution (SISR). Conventional methods still resort to restoring the
single high-resolution (HR) solution only based on the input of image modality.
However, the image-level information is insufficient to predict adequate
details and photo-realistic visual quality facing large upscaling factors (x8,
x16). In this paper, we propose a new perspective that regards the SISR as a
semantic image detail enhancement problem to generate semantically reasonable
HR image that are faithful to the ground truth. To enhance the semantic
accuracy and the visual quality of the reconstructed image, we explore the
multi-modal fusion learning in SISR by proposing a Text-Guided Super-Resolution
(TGSR) framework, which can effectively utilize the information from the text
and image modalities. Different from existing methods, the proposed TGSR could
generate HR image details that match the text descriptions through a
coarse-to-fine process. Extensive experiments and ablation studies demonstrate
the effect of the TGSR, which exploits the text reference to recover realistic
images.
Related papers
- CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Image Super-Resolution with Text Prompt Diffusion [118.023531454099]
We introduce text prompts to image SR to provide degradation priors.
PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration.
Experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images.
arXiv Detail & Related papers (2023-11-24T05:11:35Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - HiREN: Towards Higher Supervision Quality for Better Scene Text Image
Super-Resolution [32.4847482760475]
Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images.
In this paper, we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision.
arXiv Detail & Related papers (2023-07-31T05:32:57Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Single Image Internal Distribution Measurement Using Non-Local
Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE)
textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood.
Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - Learning Structral coherence Via Generative Adversarial Network for
Single Image Super-Resolution [13.803141755183827]
Recent generative adversarial network (GAN) based SISR methods have yielded overall realistic SR images.
We introduce the gradient branch into the generator to preserve structural information by restoring high-resolution gradient maps in SR process.
In addition, we utilize a U-net based discriminator to consider both the whole image and the detailed per-pixel authenticity.
arXiv Detail & Related papers (2021-01-25T15:26:23Z) - Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.