Typographic Text Generation with Off-the-Shelf Diffusion Model
- URL: http://arxiv.org/abs/2402.14314v1
- Date: Thu, 22 Feb 2024 06:15:51 GMT
- Title: Typographic Text Generation with Off-the-Shelf Diffusion Model
- Authors: KhayTze Peong, Seiichi Uchida, Daichi Haraguchi
- Abstract summary: This paper proposes a typographic text generation system to add and modify text on typographic designs.
The proposed system is a novel combination of two off-the-shelf methods for diffusion models, ControlNet and Blended Latent Diffusion.
- Score: 7.542892664684078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent diffusion-based generative models show promise in their ability to
generate text images, but limitations in specifying the styles of the generated
texts render them insufficient in the realm of typographic design. This paper
proposes a typographic text generation system to add and modify text on
typographic designs while specifying font styles, colors, and text effects. The
proposed system is a novel combination of two off-the-shelf methods for
diffusion models, ControlNet and Blended Latent Diffusion. The former functions
to generate text images under the guidance of edge conditions specifying stroke
contours. The latter blends latent noise in Latent Diffusion Models (LDM) to
add typographic text naturally onto an existing background. We first show that
given appropriate text edges, ControlNet can generate texts in specified fonts
while incorporating effects described by prompts. We further introduce text
edge manipulation as an intuitive and customizable way to produce texts with
complex effects such as ``shadows'' and ``reflections''. Finally, with the
proposed system, we successfully add and modify texts on a predefined
background while preserving its overall coherence.
Related papers
- First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending [5.3798706094384725]
We propose a new visual text blending paradigm including both creating backgrounds and rendering texts.
Specifically, a background generator is developed to produce high-fidelity and text-free natural images.
We also explore several downstream applications based on our method, including scene text dataset synthesis for boosting scene text detectors.
arXiv Detail & Related papers (2024-10-14T05:23:43Z) - Layout Agnostic Scene Text Image Synthesis with Diffusion Models [42.37340959594495]
SceneTextGen is a novel diffusion-based model specifically designed to circumvent the need for a predefined layout stage.
The novelty of SceneTextGen lies in its integration of three key components: a character-level encoder for capturing detailed typographic properties and a character-level instance segmentation model and a word-level spotting model to address the issues of unwanted text generation and minor character inaccuracies.
arXiv Detail & Related papers (2024-06-03T07:20:34Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - TextDiffuser: Diffusion Models as Text Painters [118.30923824681642]
We introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds.
We contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs.
We show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
arXiv Detail & Related papers (2023-05-18T10:16:19Z) - GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation.
Our key idea is to render the target text as a glyph image containing visual language content.
Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z) - Improving Diffusion Models for Scene Text Editing with Dual Encoders [44.12999932588205]
Scene text editing is a challenging task that involves modifying or inserting specified texts in an image.
Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing.
We propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design.
arXiv Detail & Related papers (2023-04-12T02:08:34Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.