FAST: Font-Agnostic Scene Text Editing
- URL: http://arxiv.org/abs/2308.02905v1
- Date: Sat, 5 Aug 2023 15:54:06 GMT
- Title: FAST: Font-Agnostic Scene Text Editing
- Authors: Alloy Das, Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada
Pal, Michael Blumenstein
- Abstract summary: Scene Text Editing (STE) aims to modify existing texts in an image while preserving the background and the font style of the original text of the image.
Most of the existing STE methods show inferior editing performance because of complex image backgrounds, various font styles, and varying word lengths within the text.
We propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations.
- Score: 22.666387184216678
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scene Text Editing (STE) is a challenging research problem, and it aims to
modify existing texts in an image while preserving the background and the font
style of the original text of the image. Due to its various real-life
applications, researchers have explored several approaches toward STE in recent
years. However, most of the existing STE methods show inferior editing
performance because of (1) complex image backgrounds, (2) various font styles,
and (3) varying word lengths within the text. To address such inferior editing
performance issues, in this paper, we propose a novel font-agnostic scene text
editing framework, named FAST, for simultaneously generating text in arbitrary
styles and locations while preserving a natural and realistic appearance
through combined mask generation and style transfer. The proposed approach
differs from the existing methods as they directly modify all image pixels.
Instead, the proposed method has introduced a filtering mechanism to remove
background distractions, allowing the network to focus solely on the text
regions where editing is required. Additionally, a text-style transfer module
has been designed to mitigate the challenges posed by varying word lengths.
Extensive experiments and ablations have been conducted, and the results
demonstrate that the proposed method outperforms the existing methods both
qualitatively and quantitatively.
Related papers
- Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Text-Driven Image Editing via Learnable Regions [74.45313434129005]
We introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches.
We show that this simple approach enables flexible editing that is compatible with current image generation models.
Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that correspond to the provided language descriptions.
arXiv Detail & Related papers (2023-11-28T02:27:31Z) - Towards Robust Scene Text Image Super-resolution via Explicit Location
Enhancement [59.66539728681453]
Scene text image super-resolution (STISR) aims to improve image quality while boosting downstream scene text recognition accuracy.
Most existing methods treat the foreground (character regions) and background (non-character regions) equally in the forward process.
We propose a novel method LEMMA that explicitly models character regions to produce high-level text-specific guidance for super-resolution.
arXiv Detail & Related papers (2023-07-19T05:08:47Z) - Improving Diffusion Models for Scene Text Editing with Dual Encoders [44.12999932588205]
Scene text editing is a challenging task that involves modifying or inserting specified texts in an image.
Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing.
We propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design.
arXiv Detail & Related papers (2023-04-12T02:08:34Z) - Exploring Stroke-Level Modifications for Scene Text Editing [86.33216648792964]
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text.
Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously.
We propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL)
arXiv Detail & Related papers (2022-12-05T02:10:59Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Direct Inversion: Optimization-Free Text-Driven Real Image Editing with
Diffusion Models [0.0]
We propose an optimization-free and zero fine-tuning framework that applies complex and non-rigid edits to a single real image via a text prompt.
We prove our method's efficacy in producing high-quality, diverse, semantically coherent, and faithful real image edits.
arXiv Detail & Related papers (2022-11-15T01:07:38Z) - Prompt-to-Prompt Image Editing with Cross Attention Control [41.26939787978142]
We present an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only.
We show our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.
arXiv Detail & Related papers (2022-08-02T17:55:41Z) - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors [58.71128866226768]
Recent text-to-image generation methods have incrementally improved the generated image fidelity and text relevancy.
We propose a novel text-to-image method that addresses these gaps by (i) enabling a simple control mechanism complementary to text in the form of a scene.
Our model achieves state-of-the-art FID and human evaluation results, unlocking the ability to generate high fidelity images in a resolution of 512x512 pixels.
arXiv Detail & Related papers (2022-03-24T15:44:50Z) - STEFANN: Scene Text Editor using Font Adaptive Neural Network [18.79337509555511]
We propose a method to modify text in an image at character-level.
We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color.
Our method works as a unified platform for modifying text in images.
arXiv Detail & Related papers (2019-03-04T11:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.