The Surprisingly Straightforward Scene Text Removal Method With Gated
Attention and Region of Interest Generation: A Comprehensive Prominent Model
Analysis
- URL: http://arxiv.org/abs/2210.07489v1
- Date: Fri, 14 Oct 2022 03:34:21 GMT
- Title: The Surprisingly Straightforward Scene Text Removal Method With Gated
Attention and Region of Interest Generation: A Comprehensive Prominent Model
Analysis
- Authors: Hyeonsu Lee, Chankyu Choi
- Abstract summary: Scene text removal (STR) is a task of erasing text from natural scene images.
We introduce a simple yet extremely effective Gated Attention (GA) and Region-of-Interest Generation (RoIG) methodology in this paper.
Experimental results on the benchmark dataset show that our method significantly outperforms existing state-of-the-art methods in almost all metrics.
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene text removal (STR), a task of erasing text from natural scene images,
has recently attracted attention as an important component of editing text or
concealing private information such as ID, telephone, and license plate
numbers. While there are a variety of different methods for STR actively being
researched, it is difficult to evaluate superiority because previously proposed
methods do not use the same standardized training/evaluation dataset. We use
the same standardized training/testing dataset to evaluate the performance of
several previous methods after standardized re-implementation. We also
introduce a simple yet extremely effective Gated Attention (GA) and
Region-of-Interest Generation (RoIG) methodology in this paper. GA uses
attention to focus on the text stroke as well as the textures and colors of the
surrounding regions to remove text from the input image much more precisely.
RoIG is applied to focus on only the region with text instead of the entire
image to train the model more efficiently. Experimental results on the
benchmark dataset show that our method significantly outperforms existing
state-of-the-art methods in almost all metrics with remarkably higher-quality
results. Furthermore, because our model does not generate a text stroke mask
explicitly, there is no need for additional refinement steps or sub-models,
making our model extremely fast with fewer parameters. The dataset and code are
available at this https://github.com/naver/garnet.
Related papers
- Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling.
We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels.
Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z) - EAFormer: Scene Text Segmentation with Edge-Aware Transformers [56.15069996649572]
Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts.
We propose Edge-Aware Transformers, EAFormer, to segment texts more accurately, especially at the edge of texts.
arXiv Detail & Related papers (2024-07-24T06:00:33Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Towards End-to-End Unified Scene Text Detection and Layout Analysis [60.68100769639923]
We introduce the task of unified scene text detection and layout analysis.
The first hierarchical scene text dataset is introduced to enable this novel research task.
We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way.
arXiv Detail & Related papers (2022-03-28T23:35:45Z) - Stroke-Based Scene Text Erasing Using Synthetic Data [0.0]
Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
arXiv Detail & Related papers (2021-04-23T09:29:41Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.