Learning Generative Structure Prior for Blind Text Image
Super-resolution
- URL: http://arxiv.org/abs/2303.14726v1
- Date: Sun, 26 Mar 2023 13:54:28 GMT
- Title: Learning Generative Structure Prior for Blind Text Image
Super-resolution
- Authors: Xiaoming Li, Wangmeng Zuo, Chen Change Loy
- Abstract summary: We present a novel prior that focuses more on the character structure.
To restrict the generative space of StyleGAN, we store the discrete features for each character in a codebook.
The proposed structure prior exerts stronger character-specific guidance to restore faithful and precise strokes of a designated character.
- Score: 153.05759524358467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blind text image super-resolution (SR) is challenging as one needs to cope
with diverse font styles and unknown degradation. To address the problem,
existing methods perform character recognition in parallel to regularize the SR
task, either through a loss constraint or intermediate feature condition.
Nonetheless, the high-level prior could still fail when encountering severe
degradation. The problem is further compounded given characters of complex
structures, e.g., Chinese characters that combine multiple pictographic or
ideographic symbols into a single character. In this work, we present a novel
prior that focuses more on the character structure. In particular, we learn to
encapsulate rich and diverse structures in a StyleGAN and exploit such
generative structure priors for restoration. To restrict the generative space
of StyleGAN so that it obeys the structure of characters yet remains flexible
in handling different font styles, we store the discrete features for each
character in a codebook. The code subsequently drives the StyleGAN to generate
high-resolution structural details to aid text SR. Compared to priors based on
character recognition, the proposed structure prior exerts stronger
character-specific guidance to restore faithful and precise strokes of a
designated character. Extensive experiments on synthetic and real datasets
demonstrate the compelling performance of the proposed generative structure
prior in facilitating robust text SR.
Related papers
- SAN: Structure-Aware Network for Complex and Long-tailed Chinese Text Recognition [9.190324058948987]
We propose a structure-aware network utilizing the hierarchical composition information to improve the recognition performance of complex characters.
Experiments demonstrate that the proposed approach can significantly improve the performances of complex characters and tail characters, yielding a better overall performance.
arXiv Detail & Related papers (2024-11-10T07:41:00Z) - HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition [47.86479271322264]
We propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters.
HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character.
This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features.
arXiv Detail & Related papers (2024-03-20T17:20:48Z) - Instruction-Guided Scene Text Recognition [51.853730414264625]
We propose a novel instruction-guided scene text recognition (IGTR) paradigm that formulates STR as an instruction learning problem.
We develop lightweight instruction encoder, cross-modal feature fusion module and multi-task answer head, which guides nuanced text image understanding.
IGTR outperforms existing models by significant margins, while maintaining a small model size and efficient inference speed.
arXiv Detail & Related papers (2024-01-31T14:13:01Z) - Image Super-Resolution with Text Prompt Diffusion [118.023531454099]
We introduce text prompts to image SR to provide degradation priors.
PromptSR utilizes the pre-trained language model (e.g., T5 or CLIP) to enhance restoration.
Experiments indicate that introducing text prompts into SR, yields excellent results on both synthetic and real-world images.
arXiv Detail & Related papers (2023-11-24T05:11:35Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation.
We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot
Style Transfer Approach [7.318027179922774]
ZiGAN is a powerful end-to-end Chinese calligraphy font generation framework.
It does not require any manual operation or redundant preprocessing to generate fine-grained target-style characters.
Our method has a state-of-the-art generalization ability in few-shot Chinese character style transfer.
arXiv Detail & Related papers (2021-08-08T09:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.