Selective Scene Text Removal
- URL: http://arxiv.org/abs/2309.00410v2
- Date: Tue, 3 Oct 2023 07:05:03 GMT
- Title: Selective Scene Text Removal
- Authors: Hayato Mitani, Akisato Kimura, Seiichi Uchida
- Abstract summary: Scene text removal (STR) is the image transformation task to remove text regions in scene images.
We propose a novel task setting named selective scene text removal (SSTR) that removes only target words specified by the user.
- Score: 12.03150391651337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text removal (STR) is the image transformation task to remove text
regions in scene images. The conventional STR methods remove all scene text.
This means that the existing methods cannot select text to be removed. In this
paper, we propose a novel task setting named selective scene text removal
(SSTR) that removes only target words specified by the user. Although SSTR is a
more complex task than STR, the proposed multi-module structure enables
efficient training for SSTR. Experimental results show that the proposed method
can remove target words as expected.
Related papers
- Inverse Scene Text Removal [5.892066196730197]
Scene text removal (STR) aims to erase textual elements from images.<n>STR typically detects text regions and theninpaints them.<n>This paper investi-gates Inverse STR (ISTR), which analyzes STR-processed images andfocuses on binary classification.
arXiv Detail & Related papers (2025-06-26T04:32:35Z) - Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling.
We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels.
Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z) - DeepEraser: Deep Iterative Context Mining for Generic Text Eraser [103.39279154750172]
DeepEraser is a recurrent architecture that erases the text in an image via iterative operations.
DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner.
arXiv Detail & Related papers (2024-02-29T12:39:04Z) - ViTEraser: Harnessing the Power of Vision Transformers for Scene Text
Removal with SegMIM Pretraining [58.241008246380254]
Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds.
Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization.
We propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser.
arXiv Detail & Related papers (2023-06-21T08:47:20Z) - FETNet: Feature Erasing and Transferring Network for Scene Text Removal [14.763369952265796]
Scene text removal (STR) task aims to remove text regions and recover the background smoothly in images for private information protection.
Most existing STR methods adopt encoder-decoder-based CNNs, with direct copies of the features in the skip connections.
We propose a novel Feature Erasing and Transferring (FET) mechanism to reconfigure the encoded features for STR.
arXiv Detail & Related papers (2023-06-16T02:38:30Z) - PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network [1.7259824817932292]
Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images.
We propose a Progressive-guided Scene Text Removal Network(PSSTRNet) to remove the text in the image iteratively.
arXiv Detail & Related papers (2023-06-13T15:20:37Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval [66.66400551173619]
We propose a full transformer architecture to unify cross-modal retrieval scenarios in a single $textbfVi$sion.
We develop dual contrastive learning losses to embed both image-text pairs and fusion-text pairs into a common cross-modal space.
Experimental results show that ViSTA outperforms other methods by at least $bf8.4%$ at Recall@1 for scene text aware retrieval task.
arXiv Detail & Related papers (2022-03-31T03:40:21Z) - A Simple and Strong Baseline: Progressively Region-based Scene Text
Removal Networks [72.32357172679319]
This paper presents a novel ProgrEssively Region-based scene Text eraser (PERT)
PERT decomposes the STR task to several erasing stages.
PERT introduces a region-based modification strategy to ensure the integrity of text-free areas.
arXiv Detail & Related papers (2021-06-24T14:06:06Z) - Scene Text Retrieval via Joint Text Detection and Similarity Learning [68.24531728554892]
Scene text retrieval aims to localize and search all text instances from an image gallery, which are the same or similar to a given query text.
We address this problem by directly learning a cross-modal similarity between a query text and each text instance from natural images.
In this way, scene text retrieval can be simply performed by ranking the detected text instances with the learned similarity.
arXiv Detail & Related papers (2021-04-04T07:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.