Related papers: One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance

One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance

URL: http://arxiv.org/abs/2409.14483v1
Date: Sun, 22 Sep 2024 15:05:25 GMT
Title: One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance
Authors: Minyi Zhao, Yang Wang, Jihong Guan, Shuigeng Zhou,
Abstract summary: Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging. Recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. In this paper, we propose a novel method called IMAGE to effectively recognize and recover LR scene text images simultaneously.
Score: 32.88048472109016
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Scene text recognition (STR) from high-resolution (HR) images has been significantly successful, however text reading on low-resolution (LR) images is still challenging due to insufficient visual information. Therefore, recently many scene text image super-resolution (STISR) models have been proposed to generate super-resolution (SR) images for the LR ones, then STR is done on the SR images, which thus boosts recognition performance. Nevertheless, these methods have two major weaknesses. On the one hand, STISR approaches may generate imperfect or even erroneous SR images, which mislead the subsequent recognition of STR models. On the other hand, as the STISR and STR models are jointly optimized, to pursue high recognition accuracy, the fidelity of SR images may be spoiled. As a result, neither the recognition performance nor the fidelity of STISR models are desirable. Then, can we achieve both high recognition performance and good fidelity? To this end, in this paper we propose a novel method called IMAGE (the abbreviation of Iterative MutuAl GuidancE) to effectively recognize and recover LR scene text images simultaneously. Concretely, IMAGE consists of a specialized STR model for recognition and a tailored STISR model to recover LR images, which are optimized separately. And we develop an iterative mutual guidance mechanism, with which the STR model provides high-level semantic information as clue to the STISR model for better super-resolution, meanwhile the STISR model offers essential low-level pixel clue to the STR model for more accurate recognition. Extensive experiments on two LR datasets demonstrate the superiority of our method over the existing works on both recognition performance and super-resolution fidelity.

Related papers

Self-Supervised Image Super-Resolution Quality Assessment based on Content-Free Multi-Model Oriented Representation Learning [3.0175628677371935]
Super-resolution (SR) applied to real-world low-resolution (LR) images often results in complex, irregular degradations.<n>In this work, we introduce a no-reference SR-IQA approach tailored for such highly ill-posed realistic settings.<n>The proposed method enables domain-adaptive IQA for real-world SR applications.
arXiv Detail & Related papers (2026-02-11T11:15:34Z)
Dual-domain Adaptation Networks for Realistic Image Super-resolution [81.34345637776408]
Realistic image super-resolution (SR) focuses on transforming real-world low-resolution (LR) images into high-resolution (HR) ones.<n>Current methods struggle with limited real-world LR-HR data, impacting the learning of basic image features.<n>We introduce a novel approach, which is able to efficiently adapt pre-trained image SR models from simulated to real-world datasets.
arXiv Detail & Related papers (2025-11-21T12:57:23Z)
One-Step Diffusion-based Real-World Image Super-Resolution with Visual Perception Distillation [53.24542646616045]
We propose VPD-SR, a novel visual perception diffusion distillation framework specifically designed for image super-resolution (SR) generation.<n>VPD-SR consists of two components: Explicit Semantic-aware Supervision (ESS) and High-frequency Perception (HFP) loss.<n>The proposed VPD-SR achieves superior performance compared to both previous state-of-the-art methods and the teacher model with just one-step sampling.
arXiv Detail & Related papers (2025-06-03T08:28:13Z)
GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution [15.563111624900865]
GuideSR is a novel single-step diffusion-based image super-resolution (SR) model specifically designed to enhance image fidelity. Our approach consistently outperforms existing methods across various reference-based metrics including PSNR, SSIM, LPIPS, DISTS and FID.
arXiv Detail & Related papers (2025-05-01T17:48:25Z)
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning [45.13580581290495]
This work introduces a novel "Low-Res Leads the Way" (LWay) training framework to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications.
arXiv Detail & Related papers (2024-03-05T02:29:18Z)
Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images. Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance. We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z)
ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution [60.90817228730133]
Single image super-resolution (SISR) is a challenging problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart. Recent approaches are trained on simulated LR images degraded by simplified down-sampling operators. We propose a novel Invertible scale-Conditional Function (ICF) which can scale an input image and then restore the original input with different scale conditions.
arXiv Detail & Related papers (2023-07-24T12:42:45Z)
RBSR: Efficient and Flexible Recurrent Network for Burst Super-Resolution [57.98314517861539]
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images. In this paper, we suggest fusing cues frame-by-frame with an efficient and flexible recurrent network.
arXiv Detail & Related papers (2023-06-30T12:14:13Z)
RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection [66.08293086254851]
We propose a reciprocal learning framework to reinforce the learning of a RefSR network. The newly proposed module aligns reference-input images at multi-scale feature spaces and performs reference-aware feature selection. We empirically show that multiple recent state-of-the-art RefSR models can be consistently improved with our reciprocal learning paradigm.
arXiv Detail & Related papers (2022-11-08T12:39:35Z)
Reference-based Image Super-Resolution with Deformable Attention Transformer [62.71769634254654]
RefSR aims to exploit auxiliary reference (Ref) images to super-resolve low-resolution (LR) images. This paper proposes a deformable attention Transformer, namely DATSR, with multiple scales. Experiments demonstrate that our DATSR achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-07-25T07:07:00Z)
Joint Generative Learning and Super-Resolution For Real-World Camera-Screen Degradation [6.14297871633911]
In real-world single image super-resolution (SISR) task, the low-resolution image suffers more complicated degradations. In this paper, we focus on the camera-screen degradation and build a real-world dataset (Cam-ScreenSR) We propose a joint two-stage model. Firstly, the downsampling degradation GAN(DD-GAN) is trained to model the degradation and produces more various of LR images. Then the dual residual channel attention network (DuRCAN) learns to recover the SR image.
arXiv Detail & Related papers (2020-08-01T07:10:13Z)
DDet: Dual-path Dynamic Enhancement Network for Real-World Image Super-Resolution [69.2432352477966]
Real image super-resolution(Real-SR) focus on the relationship between real-world high-resolution(HR) and low-resolution(LR) image. In this article, we propose a Dual-path Dynamic Enhancement Network(DDet) for Real-SR. Unlike conventional methods which stack up massive convolutional blocks for feature representation, we introduce a content-aware framework to study non-inherently aligned image pair.
arXiv Detail & Related papers (2020-02-25T18:24:51Z)
Characteristic Regularisation for Super-Resolving Face Images [81.84939112201377]
Existing facial image super-resolution (SR) methods focus mostly on improving artificially down-sampled low-resolution (LR) imagery. Previous unsupervised domain adaptation (UDA) methods address this issue by training a model using unpaired genuine LR and HR data. This renders the model overstretched with two tasks: consistifying the visual characteristics and enhancing the image resolution. We formulate a method that joins the advantages of conventional SR and UDA models.
arXiv Detail & Related papers (2019-12-30T16:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.