Scene Text Image Super-Resolution in the Wild
- URL: http://arxiv.org/abs/2005.03341v3
- Date: Sun, 2 Aug 2020 03:27:43 GMT
- Title: Scene Text Image Super-Resolution in the Wild
- Authors: Wenjia Wang, Enze Xie, Xuebo Liu, Wenhai Wang, Ding Liang, Chunhua
Shen, and Xiang Bai
- Abstract summary: Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones.
Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images.
We pro-pose a real scene text SR dataset, termed TextZoom.
It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
- Score: 112.90416737357141
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-resolution text images are often seen in natural scenes such as documents
captured by mobile phones. Recognizing low-resolution text images is
challenging because they lose detailed content information, leading to poor
recognition accuracy. An intuitive solution is to introduce super-resolution
(SR) techniques as pre-processing. However, previous single image
super-resolution (SISR) methods are trained on synthetic low-resolution images
(e.g.Bicubic down-sampling), which is simple and not suitable for real
low-resolution text recognition. To this end, we pro-pose a real scene text SR
dataset, termed TextZoom. It contains paired real low-resolution and
high-resolution images which are captured by cameras with different focal
length in the wild. It is more authentic and challenging than synthetic data,
as shown in Fig. 1. We argue improv-ing the recognition accuracy is the
ultimate goal for Scene Text SR. In this purpose, a new Text Super-Resolution
Network termed TSRN, with three novel modules is developed. (1) A sequential
residual block is proposed to extract the sequential information of the text
images. (2) A boundary-aware loss is designed to sharpen the character
boundaries. (3) A central alignment module is proposed to relieve the
misalignment problem in TextZoom. Extensive experiments on TextZoom demonstrate
that our TSRN largely improves the recognition accuracy by over 13%of CRNN, and
by nearly 9.0% of ASTER and MORAN compared to synthetic SR data. Furthermore,
our TSRN clearly outperforms 7 state-of-the-art SR methods in boosting the
recognition accuracy of LR images in TextZoom. For example, it outperforms
LapSRN by over 5% and 8%on the recognition accuracy of ASTER and CRNN. Our
results suggest that low-resolution text recognition in the wild is far from
being solved, thus more research effort is needed.
Related papers
- Image Super-Resolution with Text Prompt Diffusion [118.023531454099]
We introduce text prompts to image SR to provide degradation priors.
PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration.
Experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images.
arXiv Detail & Related papers (2023-11-24T05:11:35Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - Scene Text Image Super-resolution based on Text-conditional Diffusion
Models [0.0]
Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition.
In this study, we leverage text-conditional diffusion models (DMs) for STISR tasks.
We propose a novel framework for LR-HR paired text image datasets.
arXiv Detail & Related papers (2023-11-16T10:32:18Z) - TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image
Super-Resolution [18.73348268987249]
TextDiff is a diffusion-based framework tailored for scene text image super-resolution.
It achieves state-of-the-art (SOTA) performance on public benchmark datasets.
Our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods.
arXiv Detail & Related papers (2023-08-13T11:02:16Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text
Spotting [49.33891486324731]
We propose a novel cost-efficient Dynamic Low-resolution Distillation (DLD) text spotting framework.
It aims to infer images in different small but recognizable resolutions and achieve a better balance between accuracy and efficiency.
The proposed method can be optimized end-to-end and adopted in any current text spotting framework to improve the practicability.
arXiv Detail & Related papers (2022-07-14T06:49:59Z) - C3-STISR: Scene Text Image Super-resolution with Triple Clues [22.41802601665541]
Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition.
Most recent approaches use the recognizer's feedback as clues to guide super-resolution.
We present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution.
arXiv Detail & Related papers (2022-04-29T12:39:51Z) - Hyperspectral Image Super-resolution via Deep Progressive Zero-centric
Residual Learning [62.52242684874278]
Cross-modality distribution of spatial and spectral information makes the problem challenging.
We propose a novel textitlightweight deep neural network-based framework, namely PZRes-Net.
Our framework learns a high resolution and textitzero-centric residual image, which contains high-frequency spatial details of the scene.
arXiv Detail & Related papers (2020-06-18T06:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.