Exploring Font-independent Features for Scene Text Recognition
- URL: http://arxiv.org/abs/2009.07447v1
- Date: Wed, 16 Sep 2020 03:36:59 GMT
- Title: Exploring Font-independent Features for Scene Text Recognition
- Authors: Yizhi Wang and Zhouhui Lian
- Abstract summary: Scene text recognition (STR) has been extensively studied in last few years.
Many recently-proposed methods are specially designed to accommodate the arbitrary shape, layout and orientation of scene texts.
These methods, where font features and content features of characters are tangled, perform poorly in text recognition on scene images with texts in novel font styles.
- Score: 22.34023249700896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text recognition (STR) has been extensively studied in last few years.
Many recently-proposed methods are specially designed to accommodate the
arbitrary shape, layout and orientation of scene texts, but ignoring that
various font (or writing) styles also pose severe challenges to STR. These
methods, where font features and content features of characters are tangled,
perform poorly in text recognition on scene images with texts in novel font
styles. To address this problem, we explore font-independent features of scene
texts via attentional generation of glyphs in a large number of font styles.
Specifically, we introduce trainable font embeddings to shape the font styles
of generated glyphs, with the image feature of scene text only representing its
essential patterns. The generation process is directed by the spatial attention
mechanism, which effectively copes with irregular texts and generates
higher-quality glyphs than existing image-to-image translation methods.
Experiments conducted on several STR benchmarks demonstrate the superiority of
our method compared to the state of the art.
Related papers
- Decoupling Layout from Glyph in Online Chinese Handwriting Generation [6.566541829858544]
We develop a text line layout generator and stylized font synthesizer.
The layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively.
The font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references.
arXiv Detail & Related papers (2024-10-03T08:46:17Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Deformation Robust Text Spotting with Geometric Prior [5.639053898266709]
We develop a robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts.
A graph convolution network is constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters.
arXiv Detail & Related papers (2023-08-31T02:13:15Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - Weakly Supervised Scene Text Generation for Low-resource Languages [19.243705770491577]
A large number of annotated training images is crucial for training successful scene text recognition models.
Existing scene text generation methods typically rely on a large amount of paired data, which is difficult to obtain for low-resource languages.
We propose a novel weakly supervised scene text generation method that leverages a few recognition-level labels as weak supervision.
arXiv Detail & Related papers (2023-06-25T15:26:06Z) - Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation.
We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z) - Learning Generative Structure Prior for Blind Text Image
Super-resolution [153.05759524358467]
We present a novel prior that focuses more on the character structure.
To restrict the generative space of StyleGAN, we store the discrete features for each character in a codebook.
The proposed structure prior exerts stronger character-specific guidance to restore faithful and precise strokes of a designated character.
arXiv Detail & Related papers (2023-03-26T13:54:28Z) - Few-shot Font Generation by Learning Style Difference and Similarity [84.76381937516356]
We propose a novel font generation approach by learning the Difference between different styles and the Similarity of the same style (DS-Font)
Specifically, we propose a multi-layer style projector for style encoding and realize a distinctive style representation via our proposed Cluster-level Contrastive Style (CCS) loss.
arXiv Detail & Related papers (2023-01-24T13:57:25Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - GenText: Unsupervised Artistic Text Generation via Decoupled Font and
Texture Manipulation [30.654807125764965]
We propose a novel approach, namely GenText, to achieve general artistic text style transfer.
Specifically, our work incorporates three different stages, stylization, destylization, and font transfer.
Considering the difficult data acquisition of paired artistic text images, our model is designed under the unsupervised setting.
arXiv Detail & Related papers (2022-07-20T04:42:47Z) - Few-Shot Font Generation by Learning Fine-Grained Local Styles [90.39288370855115]
Few-shot font generation (FFG) aims to generate a new font with a few examples.
We propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs.
arXiv Detail & Related papers (2022-05-20T05:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.