UnrealText: Synthesizing Realistic Scene Text Images from the Unreal
World
- URL: http://arxiv.org/abs/2003.10608v6
- Date: Tue, 18 Aug 2020 01:06:39 GMT
- Title: UnrealText: Synthesizing Realistic Scene Text Images from the Unreal
World
- Authors: Shangbang Long, Cong Yao
- Abstract summary: UnrealText is an efficient image synthesis method that renders realistic images via a 3D graphics engine.
The comprehensive experiments verify its effectiveness on both scene text detection and recognition.
We generate a multilingual version for future research into multilingual scene text detection and recognition.
- Score: 18.608641449975124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Synthetic data has been a critical tool for training scene text detection and
recognition models. On the one hand, synthetic word images have proven to be a
successful substitute for real images in training scene text recognizers. On
the other hand, however, scene text detectors still heavily rely on a large
amount of manually annotated real-world images, which are expensive. In this
paper, we introduce UnrealText, an efficient image synthesis method that
renders realistic images via a 3D graphics engine. 3D synthetic engine provides
realistic appearance by rendering scene and text as a whole, and allows for
better text region proposals with access to precise scene information, e.g.
normal and even object meshes. The comprehensive experiments verify its
effectiveness on both scene text detection and recognition. We also generate a
multilingual version for future research into multilingual scene text detection
and recognition. Additionally, we re-annotate scene text recognition datasets
in a case-sensitive way and include punctuation marks for more comprehensive
evaluations. The code and the generated datasets are released at:
https://github.com/Jyouhou/UnrealText/ .
Related papers
- CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction [23.683636588751753]
State-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images.
We identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion.
arXiv Detail & Related papers (2024-07-23T06:12:19Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors [54.80516786370663]
FreeReal is a real-domain-aligned pre-training paradigm that enables the complementary strengths of LSD and real data.
GlyphMix embeds synthetic images as graffiti-like units onto real images.
FreeReal consistently outperforms previous pre-training methods by a substantial margin across four public datasets.
arXiv Detail & Related papers (2023-12-08T15:10:55Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed
Real-World Data [4.096453902709292]
Scene-text image synthesis techniques aim to naturally compose text instances on background scene images.
We propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet)
After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks.
arXiv Detail & Related papers (2022-09-06T11:15:58Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Stroke-Based Scene Text Erasing Using Synthetic Data [0.0]
Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
arXiv Detail & Related papers (2021-04-23T09:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.