Scene Style Text Editing
- URL: http://arxiv.org/abs/2304.10097v1
- Date: Thu, 20 Apr 2023 05:36:49 GMT
- Title: Scene Style Text Editing
- Authors: Tonghua Su, Fuxiang Yang, Xiang Zhou, Donglin Di, Zhongjie Wang,
Songze Li
- Abstract summary: "QuadNet" is a framework to embed and adjust foreground text styles in latent feature space.
Experiments demonstrate that QuadNet has the ability to generate photo-realistic foreground text and avoid source text shadows in real-world scenes.
- Score: 7.399980683013072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose a task called "Scene Style Text Editing (SSTE)",
changing the text content as well as the text style of the source image while
keeping the original text scene. Existing methods neglect to fine-grained
adjust the style of the foreground text, such as its rotation angle, color, and
font type. To tackle this task, we propose a quadruple framework named
"QuadNet" to embed and adjust foreground text styles in the latent feature
space. Specifically, QuadNet consists of four parts, namely background
inpainting, style encoder, content encoder, and fusion generator. The
background inpainting erases the source text content and recovers the
appropriate background with a highly authentic texture. The style encoder
extracts the style embedding of the foreground text. The content encoder
provides target text representations in the latent feature space to implement
the content edits. The fusion generator combines the information yielded from
the mentioned parts and generates the rendered text images. Practically, our
method is capable of performing promisingly on real-world datasets with merely
string-level annotation. To the best of our knowledge, our work is the first to
finely manipulate the foreground text content and style by deeply semantic
editing in the latent feature space. Extensive experiments demonstrate that
QuadNet has the ability to generate photo-realistic foreground text and avoid
source text shadows in real-world scenes when editing text content.
Related papers
- First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts.
Specifically, a background generator is developed to produce high-fidelity and text-free natural images.
We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z) - Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing [47.421888361871254]
Scene text images contain not only style information (font, background) but also content information (character, texture)
Previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance.
We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability.
arXiv Detail & Related papers (2024-05-07T15:00:11Z) - TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts [119.84478647745658]
TIPEditor is a 3D scene editing framework that accepts both text and image prompts and a 3D bounding box to specify the editing region.
Experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region.
arXiv Detail & Related papers (2024-01-26T12:57:05Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - PSGText: Stroke-Guided Scene Text Editing with PSP Module [4.151658495779136]
Scene Text Editing aims to substitute text in an image with new desired text while preserving the background and styles of the original text.
This paper introduces a three-stage framework for transferring texts across text images.
arXiv Detail & Related papers (2023-10-20T09:15:26Z) - FASTER: A Font-Agnostic Scene Text Editing and Rendering Framework [19.564048493848272]
Scene Text Editing (STE) is a challenging research problem, that primarily aims towards modifying existing texts in an image.
Existing style-transfer-based approaches have shown sub-par editing performance due to complex image backgrounds, diverse font attributes, and varying word lengths within the text.
We propose a novel font-agnostic scene text editing and rendering framework, named FASTER, for simultaneously generating text in arbitrary styles and locations.
arXiv Detail & Related papers (2023-08-05T15:54:06Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
Semantic Alignment and Generation [97.36550187238177]
We study a novel task on text-guided image manipulation on the entity level in the real world.
The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the text-irrelevant regions, and (3) to merge the manipulated entity into the image naturally.
Our framework incorporates a semantic alignment module to locate the image regions to be manipulated, and a semantic loss to help align the relationship between the vision and language.
arXiv Detail & Related papers (2022-04-09T09:01:19Z) - RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style.
We propose a novel representational learning-based STE model, referred to as RewriteNet.
Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z) - SwapText: Image Based Texts Transfer in Scenes [13.475726959175057]
We present SwapText, a framework to transfer texts across scene images.
A novel text swapping network is proposed to replace text labels only in the foreground image.
The generated foreground image and background image are used to generate the word image by the fusion network.
arXiv Detail & Related papers (2020-03-18T11:02:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.