Related papers: Text Prior Guided Scene Text Image Super-resolution

Text Prior Guided Scene Text Image Super-resolution

URL: http://arxiv.org/abs/2106.15368v2
Date: Wed, 30 Jun 2021 14:14:56 GMT
Title: Text Prior Guided Scene Text Image Super-resolution
Authors: Jianqi Ma, Shi Guo, Lei Zhang
Abstract summary: Scene text image super-resolution (STISR) aims to improve the resolution and visual quality of low-resolution (LR) scene text images. We make an attempt to embed categorical text prior into STISR model training. We present a multi-stage text prior guided super-resolution framework for STISR.
Score: 11.396781380648756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene text image super-resolution (STISR) aims to improve the resolution and visual quality of low-resolution (LR) scene text images, and consequently boost the performance of text recognition. However, most of existing STISR methods regard text images as natural scene images, ignoring the categorical information of text. In this paper, we make an inspiring attempt to embed categorical text prior into STISR model training. Specifically, we adopt the character probability sequence as the text prior, which can be obtained conveniently from a text recognition model. The text prior provides categorical guidance to recover high-resolution (HR) text images. On the other hand, the reconstructed HR image can refine the text prior in return. Finally, we present a multi-stage text prior guided super-resolution (TPGSR) framework for STISR. Our experiments on the benchmark TextZoom dataset show that TPGSR can not only effectively improve the visual quality of scene text images, but also significantly improve the text recognition accuracy over existing STISR methods. Our model trained on TextZoom also demonstrates certain generalization capability to the LR images in other datasets.

Related papers

TextGuider: Training-Free Guidance for Text Rendering via Attention Alignment [68.91073792449201]
We propose TextGuider, a training-free method that encourages accurate and complete text appearance.<n>Specifically, we analyze attention patterns in Multi-Modal Diffusion Transformer(MM-DiT) models, particularly for text-related tokens intended to be rendered in the image.<n>Our method achieves state-of-the-art performance in test-time text rendering, with significant gains in recall and strong results in OCR accuracy and CLIP score.
arXiv Detail & Related papers (2025-12-10T06:18:30Z)
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance [24.242452422416438]
We introduce TextSR, a multimodal diffusion model specifically designed for Multilingual Text Image Super-Resolution.<n>By integrating text character priors with the low-resolution text images, our model effectively guides the super-resolution process.<n>The superior performance of our model on both the TextZoom and TextVQA datasets sets a new benchmark for STISR.
arXiv Detail & Related papers (2025-05-29T05:40:35Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Image Super-Resolution with Text Prompt Diffusion [118.023531454099]
We introduce text prompts to image SR to provide degradation priors. PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration. Experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images.
arXiv Detail & Related papers (2023-11-24T05:11:35Z)
Scene Text Image Super-resolution based on Text-conditional Diffusion Models [0.0]
Scene Text Image Super-resolution (STISR) has recently achieved great success as a preprocessing method for scene text recognition. In this study, we leverage text-conditional diffusion models (DMs) for STISR tasks. We propose a novel framework for LR-HR paired text image datasets.
arXiv Detail & Related papers (2023-11-16T10:32:18Z)
Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images. Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z)
TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution [18.73348268987249]
TextDiff is a diffusion-based framework tailored for scene text image super-resolution. It achieves state-of-the-art (SOTA) performance on public benchmark datasets. Our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods.
arXiv Detail & Related papers (2023-08-13T11:02:16Z)
A Benchmark for Chinese-English Scene Text Image Super-resolution [15.042152725255171]
Scene Text Image Super-resolution (STISR) aims to recover high-resolution (HR) scene text images with visually pleasant and readable text content from low-resolution (LR) input. Most existing works focus on recovering English texts, which have relatively simple character structures. We propose a real-world Chinese-English benchmark dataset, namely Real-CE, for the task of STISR.
arXiv Detail & Related papers (2023-08-07T02:57:48Z)
Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework. It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss. It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z)
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution [13.934846626570286]
Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images. It remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones. We propose a CNN based Text ATTention network (TATT) to address this problem.
arXiv Detail & Related papers (2022-03-17T15:28:29Z)
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text [23.04601165885908]
We propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion.
arXiv Detail & Related papers (2021-05-12T07:50:42Z)
Scene Text Image Super-Resolution in the Wild [112.90416737357141]
Low-resolution text images are often seen in natural scenes such as documents captured by mobile phones. Previous single image super-resolution (SISR) methods are trained on synthetic low-resolution images. We pro-pose a real scene text SR dataset, termed TextZoom. It contains paired real low-resolution and high-resolution images captured by cameras with different focal length in the wild.
arXiv Detail & Related papers (2020-05-07T09:18:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.