Related papers: HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution

HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution

URL: http://arxiv.org/abs/2307.16410v1
Date: Mon, 31 Jul 2023 05:32:57 GMT
Title: HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution
Authors: Minyi Zhao, Yi Xu, Bingjia Li, Jie Wang, Jihong Guan, and Shuigeng Zhou
Abstract summary: Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images. In this paper, we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision.
Score: 32.4847482760475
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images. Nowadays, various methods have been proposed to extract text-specific information from high-resolution (HR) images to supervise STISR model training. However, due to uncontrollable factors (e.g. shooting equipment, focus, and environment) in manually photographing HR images, the quality of HR images cannot be guaranteed, which unavoidably impacts STISR performance. Observing the quality issue of HR images, in this paper we propose a novel idea to boost STISR by first enhancing the quality of HR images and then using the enhanced HR images as supervision to do STISR. Concretely, we develop a new STISR framework, called High-Resolution ENhancement (HiREN) that consists of two branches and a quality estimation module. The first branch is developed to recover the low-resolution (LR) images, and the other is an HR quality enhancement branch aiming at generating high-quality (HQ) text images based on the HR images to provide more accurate supervision to the LR images. As the degradation from HQ to HR may be diverse, and there is no pixel-level supervision for HQ image generation, we design a kernel-guided enhancement network to handle various degradation, and exploit the feedback from a recognizer and text-level annotations as weak supervision signal to train the HR enhancement branch. Then, a quality estimation module is employed to evaluate the qualities of HQ images, which are used to suppress the erroneous supervision information by weighting the loss of each image. Extensive experiments on TextZoom show that HiREN can work well with most existing STISR methods and significantly boost their performances.

Related papers

Blind Super Resolution with Reference Images and Implicit Degradation Representation [5.34372866210952]
Degradation kernels should account for not only the degradation process but also the downscaling factor.<n>Applying the same degradation kernel across varying super-resolution scales may be impractical.<n>Our research acknowledges degradation kernels and scaling factors as pivotal elements for the BSR task.
arXiv Detail & Related papers (2025-07-18T13:45:04Z)
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation [74.1872891313184]
HRSeg is an efficient model with high-resolution fine-grained perception.<n>It features two key innovations: High-Resolution Perception (HRP) and High-Resolution Enhancement (HRE)
arXiv Detail & Related papers (2025-07-17T08:09:31Z)
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image [51.333064033152304]
Recently launched satellites can concurrently acquire HSIs and panchromatic (PAN) images. Hipandas is a novel learning paradigm that reconstructs HRHS images from noisy low-resolution HSIs and high-resolution PAN images.
arXiv Detail & Related papers (2024-12-05T14:39:29Z)
One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance [32.88048472109016]
Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging. Recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. In this paper, we propose a novel method called IMAGE to effectively recognize and recover LR scene text images simultaneously.
arXiv Detail & Related papers (2024-09-22T15:05:25Z)
CoSeR: Bridging Image and Language for Cognitive Super-Resolution [74.24752388179992]
We introduce the Cognitive Super-Resolution (CoSeR) framework, empowering SR models with the capacity to comprehend low-resolution images. We achieve this by marrying image appearance and language understanding to generate a cognitive embedding. To further improve image fidelity, we propose a novel condition injection scheme called "All-in-Attention"
arXiv Detail & Related papers (2023-11-27T16:33:29Z)
Learning Many-to-Many Mapping for Unpaired Real-World Image Super-resolution and Downscaling [60.80788144261183]
We propose an image downscaling and SR model dubbed as SDFlow, which simultaneously learns a bidirectional many-to-many mapping between real-world LR and HR images unsupervisedly. Experimental results on real-world image SR datasets indicate that SDFlow can generate diverse realistic LR and SR images both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-10-08T01:48:34Z)
SRTGAN: Triplet Loss based Generative Adversarial Network for Real-World Super-Resolution [13.897062992922029]
An alternative solution called Single Image Super-Resolution (SISR) is a software-driven approach that aims to take a Low-Resolution (LR) image and obtain the HR image. We introduce a new triplet-based adversarial loss function that exploits the information provided in the LR image by using it as a negative sample. We propose to fuse the adversarial loss, content loss, perceptual loss, and quality loss to obtain Super-Resolution (SR) image with high perceptual fidelity.
arXiv Detail & Related papers (2022-11-22T11:17:07Z)
Rethinking Super-Resolution as Text-Guided Details Generation [21.695227836312835]
We propose a Text-Guided Super-Resolution (TGSR) framework, which can effectively utilize the information from the text and image modalities. The proposed TGSR could generate HR image details that match the text descriptions through a coarse-to-fine process.
arXiv Detail & Related papers (2022-07-14T01:46:38Z)
Hierarchical Similarity Learning for Aliasing Suppression Image Super-Resolution [64.15915577164894]
A hierarchical image super-resolution network (HSRNet) is proposed to suppress the influence of aliasing. HSRNet achieves better quantitative and visual performance than other works, and remits the aliasing more effectively.
arXiv Detail & Related papers (2022-06-07T14:55:32Z)
Single Image Internal Distribution Measurement Using Non-Local Variational Autoencoder [11.985083962982909]
This paper proposes a novel image-specific solution, namely non-local variational autoencoder (textttNLVAE) textttNLVAE is introduced as a self-supervised strategy that reconstructs high-resolution images using disentangled information from the non-local neighbourhood. Experimental results from seven benchmark datasets demonstrate the effectiveness of the textttNLVAE model.
arXiv Detail & Related papers (2022-04-02T18:43:55Z)
Memory-augmented Deep Unfolding Network for Guided Image Super-resolution [67.83489239124557]
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image. Previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image. We propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image.
arXiv Detail & Related papers (2022-02-12T15:37:13Z)
Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling [139.25215100378284]
We propose a hierarchical conditional flow (HCFlow) as a unified framework for image SR and image rescaling. HCFlow learns a mapping between HR and LR image pairs by modelling the distribution of the LR image and the rest high-frequency component simultaneously. To further enhance the performance, other losses such as perceptual loss and GAN loss are combined with the commonly used negative log-likelihood loss in training.
arXiv Detail & Related papers (2021-08-11T16:11:01Z)
Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input. Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task. We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z)
Deep Generative Adversarial Residual Convolutional Networks for Real-World Super-Resolution [31.934084942626257]
We propose a deep Super-Resolution Residual Convolutional Generative Adversarial Network (SRResCGAN) It follows the real-world degradation settings by adversarial training the model with pixel-wise supervision in the HR domain from its generated LR counterpart. The proposed network exploits the residual learning by minimizing the energy-based objective function with powerful image regularization and convex optimization techniques.
arXiv Detail & Related papers (2020-05-03T00:12:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.