HiREN: Towards Higher Supervision Quality for Better Scene Text Image
Super-Resolution
- URL: http://arxiv.org/abs/2307.16410v1
- Date: Mon, 31 Jul 2023 05:32:57 GMT
- Title: HiREN: Towards Higher Supervision Quality for Better Scene Text Image
Super-Resolution
- Authors: Minyi Zhao, Yi Xu, Bingjia Li, Jie Wang, Jihong Guan, and Shuigeng
Zhou
- Abstract summary: Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images.
In this paper, we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision.
- Score: 32.4847482760475
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scene text image super-resolution (STISR) is an important pre-processing
technique for text recognition from low-resolution scene images. Nowadays,
various methods have been proposed to extract text-specific information from
high-resolution (HR) images to supervise STISR model training. However, due to
uncontrollable factors (e.g. shooting equipment, focus, and environment) in
manually photographing HR images, the quality of HR images cannot be
guaranteed, which unavoidably impacts STISR performance. Observing the quality
issue of HR images, in this paper we propose a novel idea to boost STISR by
first enhancing the quality of HR images and then using the enhanced HR images
as supervision to do STISR. Concretely, we develop a new STISR framework,
called High-Resolution ENhancement (HiREN) that consists of two branches and a
quality estimation module. The first branch is developed to recover the
low-resolution (LR) images, and the other is an HR quality enhancement branch
aiming at generating high-quality (HQ) text images based on the HR images to
provide more accurate supervision to the LR images. As the degradation from HQ
to HR may be diverse, and there is no pixel-level supervision for HQ image
generation, we design a kernel-guided enhancement network to handle various
degradation, and exploit the feedback from a recognizer and text-level
annotations as weak supervision signal to train the HR enhancement branch.
Then, a quality estimation module is employed to evaluate the qualities of HQ
images, which are used to suppress the erroneous supervision information by
weighting the loss of each image. Extensive experiments on TextZoom show that
HiREN can work well with most existing STISR methods and significantly boost
their performances.
Related papers
- One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance [32.88048472109016]
Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging.
Recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance.
In this paper, we propose a novel method called IMAGE to effectively recognize and recover LR scene text images simultaneously.
arXiv Detail & Related papers (2024-09-22T15:05:25Z) - Realistic Extreme Image Rescaling via Generative Latent Space Learning [51.85790402171696]
We propose a novel framework called Latent Space Based Image Rescaling (LSBIR) for extreme image rescaling tasks.
LSBIR effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model to generate realistic HR images.
In the first stage, a pseudo-invertible encoder-decoder models the bidirectional mapping between the latent features of the HR image and the target-sized LR image.
In the second stage, the reconstructed features from the first stage are refined by a pre-trained diffusion model to generate more faithful and visually pleasing details.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images.
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding.
To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z) - SRTGAN: Triplet Loss based Generative Adversarial Network for Real-World
Super-Resolution [13.897062992922029]
An alternative solution called Single Image Super-Resolution (SISR) is a software-driven approach that aims to take a Low-Resolution (LR) image and obtain the HR image.
We introduce a new triplet-based adversarial loss function that exploits the information provided in the LR image by using it as a negative sample.
We propose to fuse the adversarial loss, content loss, perceptual loss, and quality loss to obtain Super-Resolution (SR) image with high perceptual fidelity.
arXiv Detail & Related papers (2022-11-22T11:17:07Z) - Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities.
The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z) - Hierarchical Similarity Learning for Aliasing Suppression Image
Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing.
HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z) - Single Image Internal Distribution Measurement Using Non-Local
Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE)
textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood.
Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z) - Memory-augmented Deep Unfolding Network for Guided Image
Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image.
Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image.
We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z) - Hierarchical Conditional Flow: A Unified Framework for Image
Super-Resolution and Image Rescaling [139.25215100378284]
We propose a hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling.
HCFlow learns a mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously.
To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training.
arXiv Detail & Related papers (2021-08-11T16:11:01Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - Deep Generative Adversarial Residual Convolutional Networks for
Real-World Super-Resolution [31.934084942626257]
We propose a deep Super-Resolution Residual Convolutional Generative Adversarial Network (SRResCGAN)
It follows the real-world degradation settings by adversarial training the model with pixel-wise supervision in the HR domain from its generated LR counterpart.
The proposed network exploits the residual learning by minimizing the energy-based objective function with powerful image regularization and convex optimization techniques.
arXiv Detail & Related papers (2020-05-03T00:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.