Related papers: FAST: Font-Agnostic Scene Text Editing

FAST: Font-Agnostic Scene Text Editing

URL: http://arxiv.org/abs/2308.02905v1
Date: Sat, 5 Aug 2023 15:54:06 GMT
Title: FAST: Font-Agnostic Scene Text Editing
Authors: Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal, Michael Blumenstein
Abstract summary: Scene Text Editing (STE) aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Most of the existing STE methods show inferior editing performance because of complex image backgrounds, various font styles, and varying word lengths within the text. We propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations.
Score: 22.666387184216678
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1) complex image backgrounds, (2) various font styles, and (3) varying word lengths within the text. To address such inferior editing performance issues, in this paper, we propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations while preserving a natural and realistic appearance through combined mask generation and style transfer. The proposed approach differs from the existing methods as they directly modify all image pixels. Instead, the proposed method has introduced a filtering mechanism to remove background distractions, allowing the network to focus solely on the text regions where editing is required. Additionally, a text-style transfer module has been designed to mitigate the challenges posed by varying word lengths. Extensive experiments and ablations have been conducted, and the results demonstrate that the proposed method outperforms the existing methods both qualitatively and quantitatively.

Related papers

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models [115.62816053600085]
We present DesignDiffusion, a framework for synthesizing design images from textual descriptions. The proposed framework directly synthesizes textual and visual design elements from user prompts. It utilizes a distinctive character embedding derived from the visual text to enhance the input prompt.
arXiv Detail & Related papers (2025-03-03T15:22:57Z)
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild [55.619708995575785]
The text in natural scene images needs to meet the following four key criteria. The generated text can facilitate to the training of natural scene OCR (Optical Character Recognition) tasks. The generated images have superior utility in OCR tasks like text detection and text recognition.
arXiv Detail & Related papers (2025-01-06T12:09:08Z)
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control [5.645654441900668]
We propose TextMaster, a solution capable of accurately editing text across various scenarios and image regions.<n>Our method enhances the accuracy and fidelity of text rendering by incorporating high-resolution standard glyph information.<n>We also propose a novel style injection technique that enables controllable style transfer for the injected text.
arXiv Detail & Related papers (2024-10-13T15:39:39Z)
TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles [12.182588762414058]
Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original. Recent works leverage diffusion models, showing improved results, yet still face challenges. We present emphTextMastero - a carefully designed multilingual scene text editing architecture based on latent diffusion models (LDMs)
arXiv Detail & Related papers (2024-08-20T08:06:09Z)
Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z)
Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. We show that this simple approach enables flexible editing that is compatible with current image generation models. Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z)
TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture. TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling. It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z)
Improving Diffusion Models for Scene Text Editing with Dual Encoders [44.12999932588205]
Scene text editing is a challenging task that involves modifying or inserting specified texts in an image. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. We propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design.
arXiv Detail & Related papers (2023-04-12T02:08:34Z)
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting. edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z)
Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z)
Prompt-to-Prompt Image Editing with Cross Attention Control [41.26939787978142]
We present an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. We show our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.
arXiv Detail & Related papers (2022-08-02T17:55:41Z)
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy. We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene. Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z)
RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image [17.715320405808935]
Scene text editing (STE) is a challenging task due to a complex intervention between text and style. We propose a novel representational learning-based STE model, referred to as RewriteNet. Our experiments demonstrate that RewriteNet achieves better quantitative and qualitative performance than other comparisons.
arXiv Detail & Related papers (2021-07-23T06:32:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.