RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image
- URL: http://arxiv.org/abs/2107.11041v1
- Date: Fri, 23 Jul 2021 06:32:58 GMT
- Title: RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image
- Authors: Junyeop Lee, Yoonsik Kim, Seonghyeon Kim, Moonbin Yim, Seung Shin,
Gayoung Lee, Sungrae Park
- Abstract summary: Scene text editing (STE) is a challenging task due to a complex intervention between text and style.
We propose a novel representational learning-based STE model, referred to as RewriteNet.
Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
- Score: 17.715320405808935
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Scene text editing (STE), which converts a text in a scene image into the
desired text while preserving an original style, is a challenging task due to a
complex intervention between text and style. To address this challenge, we
propose a novel representational learning-based STE model, referred to as
RewriteNet that employs textual information as well as visual information. We
assume that the scene text image can be decomposed into content and style
features where the former represents the text information and style represents
scene text characteristics such as font, alignment, and background. Under this
assumption, we propose a method to separately encode content and style features
of the input image by introducing the scene text recognizer that is trained by
text information. Then, a text-edited image is generated by combining the style
feature from the original image and the content feature from the target text.
Unlike previous works that are only able to use synthetic images in the
training phase, we also exploit real-world images by proposing a
self-supervised training scheme, which bridges the domain gap between synthetic
and real data. Our experiments demonstrate that RewriteNet achieves better
quantitative and qualitative performance than other comparisons. Moreover, we
validate that the use of text information and the self-supervised training
scheme improves text switching performance. The implementation and dataset will
be publicly available.
Related papers
- TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles [12.182588762414058]
Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original.
Recent works leverage diffusion models, showing improved results, yet still face challenges.
We present emphTextMastero - a carefully designed multilingual scene text editing architecture based on latent diffusion models (LDMs)
arXiv Detail & Related papers (2024-08-20T08:06:09Z) - Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing [47.421888361871254]
Scene text images contain not only style information (font, background) but also content information (character, texture)
Previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance.
We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability.
arXiv Detail & Related papers (2024-05-07T15:00:11Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic
Text Image Generation [11.186226578337125]
Style-guided text image generation tries to synthesize text image by imitating reference image's appearance.
In this paper, we focus on transferring style image's background and foreground color patterns to the content image to generate photo-realistic text image.
arXiv Detail & Related papers (2022-03-15T07:48:34Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - SwapText: Image Based Texts Transfer in Scenes [13.475726959175057]
We present SwapText, a framework to transfer texts across scene images.
A novel text swapping network is proposed to replace text labels only in the foreground image.
The generated foreground image and background image are used to generate the word image by the fusion network.
arXiv Detail & Related papers (2020-03-18T11:02:17Z) - STEFANN: Scene Text Editor using Font Adaptive Neural Network [18.79337509555511]
We propose a method to modify text in an image at character-level.
We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color.
Our method works as a unified platform for modifying text in images.
arXiv Detail & Related papers (2019-03-04T11:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.