FETNet: Feature Erasing and Transferring Network for Scene Text Removal
- URL: http://arxiv.org/abs/2306.09593v1
- Date: Fri, 16 Jun 2023 02:38:30 GMT
- Title: FETNet: Feature Erasing and Transferring Network for Scene Text Removal
- Authors: Guangtao Lyu, Kun Liu, Anna Zhu, Seiichi Uchida, Brian Kenji Iwana
- Abstract summary: Scene text removal (STR) task aims to remove text regions and recover the background smoothly in images for private information protection.
Most existing STR methods adopt encoder-decoder-based CNNs, with direct copies of the features in the skip connections.
We propose a novel Feature Erasing and Transferring (FET) mechanism to reconfigure the encoded features for STR.
- Score: 14.763369952265796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The scene text removal (STR) task aims to remove text regions and recover the
background smoothly in images for private information protection. Most existing
STR methods adopt encoder-decoder-based CNNs, with direct copies of the
features in the skip connections. However, the encoded features contain both
text texture and structure information. The insufficient utilization of text
features hampers the performance of background reconstruction in text removal
regions. To tackle these problems, we propose a novel Feature Erasing and
Transferring (FET) mechanism to reconfigure the encoded features for STR in
this paper. In FET, a Feature Erasing Module (FEM) is designed to erase text
features. An attention module is responsible for generating the feature
similarity guidance. The Feature Transferring Module (FTM) is introduced to
transfer the corresponding features in different layers based on the attention
guidance. With this mechanism, a one-stage, end-to-end trainable network called
FETNet is constructed for scene text removal. In addition, to facilitate
research on both scene text removal and segmentation tasks, we introduce a
novel dataset, Flickr-ST, with multi-category annotations. A sufficient number
of experiments and ablation studies are conducted on the public datasets and
Flickr-ST. Our proposed method achieves state-of-the-art performance using most
metrics, with remarkably higher quality scene text removal results. The source
code of our work is available at:
\href{https://github.com/GuangtaoLyu/FETNet}{https://github.com/GuangtaoLyu/FETNet.
Related papers
- Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling.
We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels.
Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z) - DeepEraser: Deep Iterative Context Mining for Generic Text Eraser [103.39279154750172]
DeepEraser is a recurrent architecture that erases the text in an image via iterative operations.
DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner.
arXiv Detail & Related papers (2024-02-29T12:39:04Z) - ViTEraser: Harnessing the Power of Vision Transformers for Scene Text
Removal with SegMIM Pretraining [58.241008246380254]
Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds.
Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization.
We propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser.
arXiv Detail & Related papers (2023-06-21T08:47:20Z) - PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network [1.7259824817932292]
Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images.
We propose a Progressive-guided Scene Text Removal Network(PSSTRNet) to remove the text in the image iteratively.
arXiv Detail & Related papers (2023-06-13T15:20:37Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - Semantically Self-Aligned Network for Text-to-Image Part-aware Person
Re-identification [78.45528514468836]
Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions.
We propose a Semantically Self-Aligned Network (SSAN) to handle the above problems.
To expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES.
arXiv Detail & Related papers (2021-07-27T08:26:47Z) - A Simple and Strong Baseline: Progressively Region-based Scene Text
Removal Networks [72.32357172679319]
This paper presents a novel ProgrEssively Region-based scene Text eraser (PERT)
PERT decomposes the STR task to several erasing stages.
PERT introduces a region-based modification strategy to ensure the integrity of text-free areas.
arXiv Detail & Related papers (2021-06-24T14:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.