Self-Supervised Text Erasing with Controllable Image Synthesis
- URL: http://arxiv.org/abs/2204.12743v1
- Date: Wed, 27 Apr 2022 07:21:55 GMT
- Title: Self-Supervised Text Erasing with Controllable Image Synthesis
- Authors: Gangwei Jiang, Shiyao Wang, Tiezheng Ge, Yuning Jiang, Ying Wei, Defu
Lian
- Abstract summary: We study an unsupervised scenario by proposing a novel Self-supervised Text Erasing framework.
We first design a style-aware image synthesis function to generate synthetic images with diverse styled texts.
To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms.
The proposed method has been extensively evaluated with both PosterErase and the widely-used SCUT-Entext dataset.
- Score: 33.60862002159276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent efforts on scene text erasing have shown promising results. However,
existing methods require rich yet costly label annotations to obtain robust
models, which limits the use for practical applications. To this end, we study
an unsupervised scenario by proposing a novel Self-supervised Text Erasing
(STE) framework that jointly learns to synthesize training images with erasure
ground-truth and accurately erase texts in the real world. We first design a
style-aware image synthesis function to generate synthetic images with diverse
styled texts based on two synthetic mechanisms. To bridge the text style gap
between the synthetic and real-world data, a policy network is constructed to
control the synthetic mechanisms by picking style parameters with the guidance
of two specifically designed rewards. The synthetic training images with
erasure ground-truth are then fed to train a coarse-to-fine erasing network. To
produce better erasing outputs, a triplet erasure loss is designed to enforce
the refinement stage to recover background textures. Moreover, we provide a new
dataset (called PosterErase), which contains 60K high-resolution posters with
texts and is more challenging for the text erasing task. The proposed method
has been extensively evaluated with both PosterErase and the widely-used
SCUT-Enstext dataset. Notably, on PosterErase, our unsupervised method achieves
5.07 in terms of FID, with a relative performance of 20.9% over existing
supervised baselines.
Related papers
- TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images [84.08181780666698]
TextDestroyer is the first training- and annotation-free method for scene text destruction.
Our method scrambles text areas in the latent start code using a Gaussian distribution before reconstruction.
The advantages of TextDestroyer include: (1) it eliminates labor-intensive data annotation and resource-intensive training; (2) it achieves more thorough text destruction, preventing recognizable traces; and (3) it demonstrates better generalization capabilities, performing well on both real-world scenes and generated images.
arXiv Detail & Related papers (2024-11-01T04:41:00Z) - Text2Data: Low-Resource Data Generation with Textual Control [104.38011760992637]
Natural language serves as a common and straightforward control signal for humans to interact seamlessly with machines.
We propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model.
It undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z) - Generating Non-Stationary Textures using Self-Rectification [70.91414475376698]
This paper addresses the challenge of example-based non-stationary texture synthesis.
We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools.
Our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture.
arXiv Detail & Related papers (2024-01-05T15:07:05Z) - Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors [54.80516786370663]
FreeReal is a real-domain-aligned pre-training paradigm that enables the complementary strengths of LSD and real data.
GlyphMix embeds synthetic images as graffiti-like units onto real images.
FreeReal consistently outperforms previous pre-training methods by a substantial margin across four public datasets.
arXiv Detail & Related papers (2023-12-08T15:10:55Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - Progressive Scene Text Erasing with Self-Supervision [7.118419154170154]
Scene text erasing seeks to erase text contents from scene images.
Current state-of-the-art text erasing models are trained on large-scale synthetic data.
We employ self-supervision for feature representation on unlabeled real-world scene text images.
arXiv Detail & Related papers (2022-07-23T09:05:13Z) - Stroke-Based Scene Text Erasing Using Synthetic Data [0.0]
Scene text erasing can replace text regions with reasonable content in natural images.
The lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength.
We enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine.
This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing.
arXiv Detail & Related papers (2021-04-23T09:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.