HiREN: Towards Higher Supervision Quality for Better Scene Text Image
Super-Resolution
- URL: http://arxiv.org/abs/2307.16410v1
- Date: Mon, 31 Jul 2023 05:32:57 GMT
- Title: HiREN: Towards Higher Supervision Quality for Better Scene Text Image
Super-Resolution
- Authors: Minyi Zhao, Yi Xu, Bingjia Li, Jie Wang, Jihong Guan, and Shuigeng
Zhou
- Abstract summary: Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images.
In this paper, we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision.
- Score: 32.4847482760475
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scene text image super-resolution (STISR) is an important pre-processing
technique for text recognition from low-resolution scene images. Nowadays,
various methods have been proposed to extract text-specific information from
high-resolution (HR) images to supervise STISR model training. However, due to
uncontrollable factors (e.g. shooting equipment, focus, and environment) in
manually photographing HR images, the quality of HR images cannot be
guaranteed, which unavoidably impacts STISR performance. Observing the quality
issue of HR images, in this paper we propose a novel idea to boost STISR by
first enhancing the quality of HR images and then using the enhanced HR images
as supervision to do STISR. Concretely, we develop a new STISR framework,
called High-Resolution ENhancement (HiREN) that consists of two branches and a
quality estimation module. The first branch is developed to recover the
low-resolution (LR) images, and the other is an HR quality enhancement branch
aiming at generating high-quality (HQ) text images based on the HR images to
provide more accurate supervision to the LR images. As the degradation from HQ
to HR may be diverse, and there is no pixel-level supervision for HQ image
generation, we design a kernel-guided enhancement network to handle various
degradation, and exploit the feedback from a recognizer and text-level
annotations as weak supervision signal to train the HR enhancement branch.
Then, a quality estimation module is employed to evaluate the qualities of HQ
images, which are used to suppress the erroneous supervision information by
weighting the loss of each image. Extensive experiments on TextZoom show that
HiREN can work well with most existing STISR methods and significantly boost
their performances.
Related papers
- CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - Learning Many-to-Many Mapping for Unpaired Real-World Image
Super-resolution and Downscaling [60.80788144261183]
We propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional many-to-many mapping between real-world LR and HR images unsupervisedly.
Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-10-08T01:48:34Z) - ESTISR: Adapting Efficient Scene Text Image Super-resolution for
Real-Scenes [25.04435367653037]
Scene text image super-resolution (STISR) has yielded remarkable improvements in accurately recognizing scene text.
We propose a novel Efficient Scene Text Image Super-resolution (ESTISR) Network for resource-limited deployment platform.
ESTISR consistently outperforms current methods in terms of actual running time and peak memory consumption.
arXiv Detail & Related papers (2023-06-04T19:14:44Z) - SRTGAN: Triplet Loss based Generative Adversarial Network for Real-World
Super-Resolution [13.897062992922029]
An alternative solution called Single Image Super-Resolution (SISR) is a software-driven approach that aims to take a Low-Resolution (LR) image and obtain the HR image.
We introduce a new triplet-based adversarial loss function that exploits the information provided in the LR image by using it as a negative sample.
We propose to fuse the adversarial loss, content loss, perceptual loss, and quality loss to obtain Super-Resolution (SR) image with high perceptual fidelity.
arXiv Detail & Related papers (2022-11-22T11:17:07Z) - Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Single Image Internal Distribution Measurement Using Non-Local
Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE)
textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood.
Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - Hierarchical Conditional Flow: A Unified Framework for Image
Super-Resolution and Image Rescaling [139.25215100378284]
We propose a hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling.
HCFlow learns a mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously.
To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training.
arXiv Detail & Related papers (2021-08-11T16:11:01Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - Deep Generative Adversarial Residual Convolutional Networks for
Real-World Super-Resolution [31.934084942626257]
We propose a deep Super-Resolution Residual Convolutional Generative Adversarial Network (SRResCGAN)
It follows the real-world degradation settings by adversarial training the model with pixel-wise supervision in the HR domain from its generated LR counterpart.
The proposed network exploits the residual learning by minimizing the energy-based objective function with powerful image regularization and convex optimization techniques.
arXiv Detail & Related papers (2020-05-03T00:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.