One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance
- URL: http://arxiv.org/abs/2409.14483v1
- Date: Sun, 22 Sep 2024 15:05:25 GMT
- Title: One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance
- Authors: Minyi Zhao, Yang Wang, Jihong Guan, Shuigeng Zhou,
- Abstract summary: Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging.
Recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance.
In this paper, we propose a novel method called IMAGE to effectively recognize and recover LR scene text images simultaneously.
- Score: 32.88048472109016
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging due to insufficient visual information. Therefore, recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. Nevertheless, these methods have two major weaknesses. On the one hand, STISR approaches may generate imperfect or even erroneous SR images, which mislead the subsequent recognition of STR models. On the other hand, as the STISR and STR models are jointly optimized, to pursue high recognition accuracy, the fidelity of SR images may be spoiled. As a result, neither the recognition performance nor the fidelity of STISR models are desirable. Then, can we achieve both high recognition performance and good fidelity? To this end, in this paper we propose a novel method called IMAGE (the abbreviation of Iterative MutuAl GuidancE) to effectively recognize and recover LR scene text images simultaneously. Concretely, IMAGE consists of a specialized STR model for recognition and a tailored STISR model to recover LR images, which are optimized separately. And we develop an iterative mutual guidance mechanism, with which the STR model provides high-level semantic information as clue to the STISR model for better super-resolution, meanwhile the STISR model offers essential low-level pixel clue to the STR model for more accurate recognition. Extensive experiments on two LR datasets demonstrate the superiority of our method over the existing works on both recognition performance and super-resolution fidelity.
Related papers
- Low-Res Leads the Way: Improving Generalization for Super-Resolution by
Self-Supervised Learning [45.13580581290495]
This work introduces a novel "Low-Res Leads the Way" (LWay) training framework to enhance the adaptability of SR models to real-world images.
Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction.
Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.
arXiv Detail & Related papers (2024-03-05T02:29:18Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised
Real-world Single Image Super-Resolution [60.90817228730133]
Single image super-resolution (SISR) is a challenging problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart.
Recent approaches are trained on simulated LR images degraded by simplified down-sampling operators.
We propose a novel Invertible scale-Conditional Function (ICF) which can scale an input image and then restore the original input with different scale conditions.
arXiv Detail & Related papers (2023-07-24T12:42:45Z) - RBSR: Efficient and Flexible Recurrent Network for Burst
Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images.
In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z) - RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive
Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network.
The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection.
We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z) - Reference-based Image Super-Resolution with Deformable Attention
Transformer [62.71769634254654]
RefSR aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images.
This paper proposes a deformable attention Transformer, namely DATSR, with multiple scales.
Experiments demonstrate that our DATSR achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-07-25T07:07:00Z) - Joint Generative Learning and Super-Resolution For Real-World
Camera-Screen Degradation [6.14297871633911]
In real-world single image super-resolution (SISR) task, the low-resolution image suffers more complicated degradations.
In this paper, we focus on the camera-screen degradation and build a real-world dataset (Cam-ScreenSR)
We propose a joint two-stage model. Firstly, the downsampling degradation GAN(DD-GAN) is trained to model the degradation and produces more various of LR images.
Then the dual residual channel attention network (DuRCAN) learns to recover the SR image.
arXiv Detail & Related papers (2020-08-01T07:10:13Z) - DDet: Dual-path Dynamic Enhancement Network for Real-World Image
Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image.
In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR.
Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z) - Characteristic Regularisation for Super-Resolving Face Images [81.84939112201377]
Existing facial image super-resolution (SR) methods focus mostly on improving artificially down-sampled low-resolution (LR) imagery.
Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data.
This renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution.
We formulate a method that joins the advantages of conventional SR and UDA models.
arXiv Detail & Related papers (2019-12-30T16:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.