Stroke-Based Scene Text Erasing Using Synthetic Data
- URL: http://arxiv.org/abs/2104.11493v1
- Date: Fri, 23 Apr 2021 09:29:41 GMT
- Title: Stroke-Based Scene Text Erasing Using Synthetic Data
- Authors: Zhengmi Tang, Tomo Miyazaki, Yoshihiro Sugaya, and Shinichiro Omachi
- Abstract summary: Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text erasing, which replaces text regions with reasonable content in
natural images, has drawn attention in the computer vision community in recent
years. There are two potential subtasks in scene text erasing: text detection
and image inpainting. Either sub-task requires considerable data to achieve
better performance; however, the lack of a large-scale real-world scene-text
removal dataset allows the existing methods to not work in full strength. To
avoid the limitation of the lack of pairwise real-world data, we enhance and
make full use of the synthetic text and consequently train our model only on
the dataset generated by the improved synthetic text engine. Our proposed
network contains a stroke mask prediction module and background inpainting
module that can extract the text stroke as a relatively small hole from the
text image patch to maintain more background content for better inpainting
results. This model can partially erase text instances in a scene image with a
bounding box provided or work with an existing scene text detector for
automatic scene text erasing. The experimental results of qualitative
evaluation and quantitative evaluation on the SCUT-Syn, ICDAR2013, and
SCUT-EnsText datasets demonstrate that our method significantly outperforms
existing state-of-the-art methods even when trained on real-world data.
Related papers
- WAS: Dataset and Methods for Artistic Text Segmentation [57.61335995536524]
This paper focuses on the more challenging task of artistic text segmentation and constructs a real artistic text segmentation dataset.
We propose a decoder with the layer-wise momentum query to prevent the model from ignoring stroke regions of special shapes.
We also propose a skeleton-assisted head to guide the model to focus on the global structure.
arXiv Detail & Related papers (2024-07-31T18:29:36Z) - CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction [23.683636588751753]
State-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images.
We identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion.
arXiv Detail & Related papers (2024-07-23T06:12:19Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed
Real-World Data [4.096453902709292]
Scene-text image synthesis techniques aim to naturally compose text instances on background scene images.
We propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet)
After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks.
arXiv Detail & Related papers (2022-09-06T11:15:58Z) - Progressive Scene Text Erasing with Self-Supervision [7.118419154170154]
Scene text erasing seeks to erase text contents from scene images.
Current state-of-the-art text erasing models are trained on large-scale synthetic data.
We employ self-supervision for feature representation on unlabeled real-world scene text images.
arXiv Detail & Related papers (2022-07-23T09:05:13Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Scene text removal via cascaded text stroke detection and erasing [19.306751704904705]
Recent learning-based approaches show promising performance improvement for scene text removal task.
We propose a novel "end-to-end" framework based on accurate text stroke detection.
arXiv Detail & Related papers (2020-11-19T11:05:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.