Exploring Stroke-Level Modifications for Scene Text Editing
- URL: http://arxiv.org/abs/2212.01982v1
- Date: Mon, 5 Dec 2022 02:10:59 GMT
- Title: Exploring Stroke-Level Modifications for Scene Text Editing
- Authors: Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, Yongdong
Zhang
- Abstract summary: Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
- Score: 86.33216648792964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text editing (STE) aims to replace text with the desired one while
preserving background and styles of the original text. However, due to the
complicated background textures and various text styles, existing methods fall
short in generating clear and legible edited text images. In this study, we
attribute the poor editing performance to two problems: 1) Implicit decoupling
structure. Previous methods of editing the whole image have to learn different
translation rules of background and text regions simultaneously. 2) Domain gap.
Due to the lack of edited real scene text images, the network can only be well
trained on synthetic pairs and performs poorly on real-world images. To handle
the above problems, we propose a novel network by MOdifying Scene Text image at
strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly
indicate regions to be edited. Different from the implicit one by directly
modifying all the pixels at image level, such explicit instructions filter out
the distractions from background and guide the network to focus on editing
rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning
to train the network with both labeled synthetic images and unpaired real scene
text images. Thus, the STE model is adapted to real-world datasets
distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are
proposed to fill the blank of public evaluation datasets. Extensive experiments
demonstrate that our MOSTEL outperforms previous methods both qualitatively and
quantitatively. Datasets and code will be available at
https://github.com/qqqyd/MOSTEL.
Related papers
- ZONE: Zero-Shot Instruction-Guided Local Editing [56.56213730578504]
We propose a Zero-shot instructiON-guided local image Editing approach, termed ZONE.
We first convert the editing intent from the user-provided instruction into specific image editing regions through InstructPix2Pix.
We then propose a Region-IoU scheme for precise image layer extraction from an off-the-shelf segment model.
arXiv Detail & Related papers (2023-12-28T02:54:34Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Style Generation: Image Synthesis based on Coarsely Matched Texts [10.939482612568433]
We introduce a novel task called text-based style generation and propose a two-stage generative adversarial network.
The first stage generates the overall image style with a sentence feature, and the second stage refines the generated style with a synthetic feature.
The practical potential of our work is demonstrated by various applications such as text-image alignment and story visualization.
arXiv Detail & Related papers (2023-09-08T21:51:11Z) - Improving Diffusion Models for Scene Text Editing with Dual Encoders [44.12999932588205]
Scene text editing is a challenging task that involves modifying or inserting specified texts in an image.
Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing.
We propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design.
arXiv Detail & Related papers (2023-04-12T02:08:34Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [86.92711729969488]
We exploit the amazing capacities of pretrained diffusion models for the editing of images.
They either finetune the model, or invert the image in the latent space of the pretrained model.
They suffer from two problems: Unsatisfying results for selected regions, and unexpected changes in nonselected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
Semantic Alignment and Generation [97.36550187238177]
We study a novel task on text-guided image manipulation on the entity level in the real world.
The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the text-irrelevant regions, and (3) to merge the manipulated entity into the image naturally.
Our framework incorporates a semantic alignment module to locate the image regions to be manipulated, and a semantic loss to help align the relationship between the vision and language.
arXiv Detail & Related papers (2022-04-09T09:01:19Z) - RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style.
We propose a novel representational learning-based STE model, referred to as RewriteNet.
Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z) - SwapText: Image Based Texts Transfer in Scenes [13.475726959175057]
We present SwapText, a framework to transfer texts across scene images.
A novel text swapping network is proposed to replace text labels only in the foreground image.
The generated foreground image and background image are used to generate the word image by the fusion network.
arXiv Detail & Related papers (2020-03-18T11:02:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.