Inverse Scene Text Removal
- URL: http://arxiv.org/abs/2506.21002v1
- Date: Thu, 26 Jun 2025 04:32:35 GMT
- Title: Inverse Scene Text Removal
- Authors: Takumi Yoshimatsu, Shumpei Takezaki, Seiichi Uchida,
- Abstract summary: Scene text removal (STR) aims to erase textual elements from images.<n>STR typically detects text regions and theninpaints them.<n>This paper investi-gates Inverse STR (ISTR), which analyzes STR-processed images andfocuses on binary classification.
- Score: 5.892066196730197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text removal (STR) aims to erase textual elements from images. It was originally intended for removing privacy-sensitiveor undesired texts from natural scene images, but is now also appliedto typographic images. STR typically detects text regions and theninpaints them. Although STR has advanced through neural networksand synthetic data, misuse risks have increased. This paper investi-gates Inverse STR (ISTR), which analyzes STR-processed images andfocuses on binary classification (detecting whether an image has un-dergone STR) and localizing removed text regions. We demonstrate inexperiments that these tasks are achievable with high accuracies, en-abling detection of potential misuse and improving STR. We also at-tempt to recover the removed text content by training a text recognizerto understand its difficulty.
Related papers
- Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling [44.70973195966149]
Existing scene text removal (STR) task suffers from insufficient training data due to the expensive pixel-level labeling.
We introduce a Text-aware Masked Image Modeling algorithm (TMIM), which can pretrain STR models with low-cost text detection labels.
Our method outperforms other pretrain methods and achieves state-of-the-art performance (37.35 PSNR on SCUT-EnsText)
arXiv Detail & Related papers (2024-09-20T11:52:57Z) - DeepEraser: Deep Iterative Context Mining for Generic Text Eraser [103.39279154750172]
DeepEraser is a recurrent architecture that erases the text in an image via iterative operations.
DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner.
arXiv Detail & Related papers (2024-02-29T12:39:04Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - Selective Scene Text Removal [12.03150391651337]
Scene text removal (STR) is the image transformation task to remove text regions in scene images.
We propose a novel task setting named selective scene text removal (SSTR) that removes only target words specified by the user.
arXiv Detail & Related papers (2023-09-01T12:07:40Z) - ViTEraser: Harnessing the Power of Vision Transformers for Scene Text
Removal with SegMIM Pretraining [58.241008246380254]
Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds.
Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization.
We propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser.
arXiv Detail & Related papers (2023-06-21T08:47:20Z) - PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network [1.7259824817932292]
Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images.
We propose a Progressive-guided Scene Text Removal Network(PSSTRNet) to remove the text in the image iteratively.
arXiv Detail & Related papers (2023-06-13T15:20:37Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - Image-Specific Information Suppression and Implicit Local Alignment for
Text-based Person Search [61.24539128142504]
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text.
Most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities.
We propose an efficient joint Multi-level Alignment Network (MANet) for TBPS, which can learn aligned image/text feature representations between modalities at multiple levels.
arXiv Detail & Related papers (2022-08-30T16:14:18Z) - A Text Attention Network for Spatial Deformation Robust Scene Text Image
Super-resolution [13.934846626570286]
Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images.
It remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones.
We propose a CNN based Text ATTention network (TATT) to address this problem.
arXiv Detail & Related papers (2022-03-17T15:28:29Z) - CORE-Text: Improving Scene Text Detection with Contrastive Relational
Reasoning [65.57338873921168]
Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision.
In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module.
We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text.
arXiv Detail & Related papers (2021-12-14T16:22:25Z) - A Simple and Strong Baseline: Progressively Region-based Scene Text
Removal Networks [72.32357172679319]
This paper presents a novel ProgrEssively Region-based scene Text eraser (PERT)
PERT decomposes the STR task to several erasing stages.
PERT introduces a region-based modification strategy to ensure the integrity of text-free areas.
arXiv Detail & Related papers (2021-06-24T14:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.