SwapText: Image Based Texts Transfer in Scenes
- URL: http://arxiv.org/abs/2003.08152v1
- Date: Wed, 18 Mar 2020 11:02:17 GMT
- Title: SwapText: Image Based Texts Transfer in Scenes
- Authors: Qiangpeng Yang, Hongsheng Jin, Jun Huang, Wei Lin
- Abstract summary: We present SwapText, a framework to transfer texts across scene images.
A novel text swapping network is proposed to replace text labels only in the foreground image.
The generated foreground image and background image are used to generate the word image by the fusion network.
- Score: 13.475726959175057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Swapping text in scene images while preserving original fonts, colors, sizes
and background textures is a challenging task due to the complex interplay
between different factors. In this work, we present SwapText, a three-stage
framework to transfer texts across scene images. First, a novel text swapping
network is proposed to replace text labels only in the foreground image.
Second, a background completion network is learned to reconstruct background
images. Finally, the generated foreground image and background image are used
to generate the word image by the fusion network. Using the proposing
framework, we can manipulate the texts of the input images even with severe
geometric distortion. Qualitative and quantitative results are presented on
several scene text datasets, including regular and irregular text datasets. We
conducted extensive experiments to prove the usefulness of our method such as
image based text translation, text image synthesis, etc.
Related papers
- First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts.
Specifically, a background generator is developed to produce high-fidelity and text-free natural images.
We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z) - CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction [23.683636588751753]
State-of-the-art inpainting methods are mainly designed for nature images and cannot correctly recover text within scene text images.
We identify the visual-text inpainting task to achieve high-quality scene text image restoration and text completion.
arXiv Detail & Related papers (2024-07-23T06:12:19Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - PSGText: Stroke-Guided Scene Text Editing with PSP Module [4.151658495779136]
Scene Text Editing aims to substitute text in an image with new desired text while preserving the background and styles of the original text.
This paper introduces a three-stage framework for transferring texts across text images.
arXiv Detail & Related papers (2023-10-20T09:15:26Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise
Semantic Alignment and Generation [97.36550187238177]
We study a novel task on text-guided image manipulation on the entity level in the real world.
The task imposes three basic requirements, (1) to edit the entity consistent with the text descriptions, (2) to preserve the text-irrelevant regions, and (3) to merge the manipulated entity into the image naturally.
Our framework incorporates a semantic alignment module to locate the image regions to be manipulated, and a semantic loss to help align the relationship between the vision and language.
arXiv Detail & Related papers (2022-04-09T09:01:19Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - RewriteNet: Realistic Scene Text Image Generation via Editing Text in
Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style.
We propose a novel representational learning-based STE model, referred to as RewriteNet.
Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z) - DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis [80.54273334640285]
We propose a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators.
We also propose a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output.
Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images.
arXiv Detail & Related papers (2020-08-13T12:51:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.