Deformation Robust Text Spotting with Geometric Prior
- URL: http://arxiv.org/abs/2308.16404v1
- Date: Thu, 31 Aug 2023 02:13:15 GMT
- Title: Deformation Robust Text Spotting with Geometric Prior
- Authors: Xixuan Hao, Aozhong Zhang, Xianze Meng and Bin Fu
- Abstract summary: We develop a robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts.
A graph convolution network is constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters.
- Score: 5.639053898266709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of text spotting is to perform text detection and recognition in an
end-to-end manner. Although the diversity of luminosity and orientation in
scene texts has been widely studied, the font diversity and shape variance of
the same character are ignored in recent works, since most characters in
natural images are rendered in standard fonts. To solve this problem, we
present a Chinese Artistic Dataset, termed as ARText, which contains 33,000
artistic images with rich shape deformation and font diversity. Based on this
database, we develop a deformation robust text spotting method (DR TextSpotter)
to solve the recognition problem of complex deformation of characters in
different fonts. Specifically, we propose a geometric prior module to highlight
the important features based on the unsupervised landmark detection
sub-network. A graph convolution network is further constructed to fuse the
character features and landmark features, and then performs semantic reasoning
to enhance the discrimination for different characters. The experiments are
conducted on ARText and IC19-ReCTS datasets. Our results demonstrate the
effectiveness of our proposed method.
Related papers
- Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings.
We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features.
Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z) - Research on Multilingual Natural Scene Text Detection Algorithm [4.514028820667202]
We propose a multilingual text detection model to address the issues of low accuracy and high difficulty in detecting multilingual text in natural scenes.
We introduce the SFM Swin Transformer feature extraction network to enhance the model's robustness in detecting characters and fonts across different languages.
To overcome this, we propose a Global Semantic Branch, extracting and preserving global features for more effective text detection.
arXiv Detail & Related papers (2023-12-18T12:46:35Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [11.798006331912056]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions.
We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Person Text-Image Matching via Text-Featur Interpretability Embedding
and External Attack Node Implantation [22.070781214170164]
Person text-image matching aims to retrieve images of specific pedestrians using text descriptions.
The lack of interpretability of text features makes it challenging to effectively align them with their corresponding image features.
We propose a person text-image matching method by embedding text-feature interpretability and an external attack node.
arXiv Detail & Related papers (2022-11-16T04:15:37Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - Toward Understanding WordArt: Corner-Guided Transformer for Scene Text
Recognition [63.6608759501803]
We propose to recognize artistic text at three levels.
corner points are applied to guide the extraction of local features inside characters, considering the robustness of corner structures to appearance and shape.
Secondly, we design a character contrastive loss to model the character-level feature, improving the feature representation for character classification.
Thirdly, we utilize Transformer to learn the global feature on image-level and model the global relationship of the corner points.
arXiv Detail & Related papers (2022-07-31T14:11:05Z) - Exploring Font-independent Features for Scene Text Recognition [22.34023249700896]
Scene text recognition (STR) has been extensively studied in last few years.
Many recently-proposed methods are specially designed to accommodate the arbitrary shape, layout and orientation of scene texts.
These methods, where font features and content features of characters are tangled, perform poorly in text recognition on scene images with texts in novel font styles.
arXiv Detail & Related papers (2020-09-16T03:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.