Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation
- URL: http://arxiv.org/abs/2504.21325v1
- Date: Wed, 30 Apr 2025 05:24:49 GMT
- Title: Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation
- Authors: Abdul Sami, Avinash Kumar, Irfanullah Memon, Youngwon Jo, Muhammad Rizwan, Jaeyoung Choi,
- Abstract summary: Automatic font generation (AFG) is the process of creating a new font using only a few examples of the style images.<n>We present a diffusion-based AFG method which generates high-quality, diverse Korean font images.<n>Key innovation is our text encoder, which processes phonetic representations to generate accurate and contextually correct characters.
- Score: 7.281838207050202
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic font generation (AFG) is the process of creating a new font using only a few examples of the style images. Generating fonts for complex languages like Korean and Chinese, particularly in handwritten styles, presents significant challenges. Traditional AFGs, like Generative adversarial networks (GANs) and Variational Auto-Encoders (VAEs), are usually unstable during training and often face mode collapse problems. They also struggle to capture fine details within font images. To address these problems, we present a diffusion-based AFG method which generates high-quality, diverse Korean font images using only a single reference image, focusing on handwritten and printed styles. Our approach refines noisy images incrementally, ensuring stable training and visually appealing results. A key innovation is our text encoder, which processes phonetic representations to generate accurate and contextually correct characters, even for unseen characters. We used a pre-trained style encoder from DG FONT to effectively and accurately encode the style images. To further enhance the generation quality, we used perceptual loss that guides the model to focus on the global style of generated images. Experimental results on over 2000 Korean characters demonstrate that our model consistently generates accurate and detailed font images and outperforms benchmark methods, making it a reliable tool for generating authentic Korean fonts across different styles.
Related papers
- Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models [76.68654868991517]
Long-form text in images, such as paragraphs in slides or documents, remains a major challenge for current generative models.<n>We introduce a novel text-focused, binary tokenizer optimized for capturing detailed scene text features.<n>We develop ModelName, a multimodal autoregressive model that excels in generating high-quality long-text images with unprecedented fidelity.
arXiv Detail & Related papers (2025-03-26T03:44:25Z) - Zero-Shot Styled Text Image Generation, but Make It Autoregressive [34.09957000751439]
Styled Handwritten Text Generation (HTG) has recently received attention from the computer vision and document analysis communities.<n>We propose a novel framework for text image generation, dubbed Emuru.<n>Our approach leverages a powerful text image representation model (a variational autoencoder) combined with an autoregressive Transformer.
arXiv Detail & Related papers (2025-03-21T11:56:20Z) - Skeleton and Font Generation Network for Zero-shot Chinese Character Generation [53.08596064763731]
We propose a novel Skeleton and Font Generation Network (SFGN) to achieve a more robust Chinese character font generation.<n>We conduct experiments on misspelled characters, a substantial portion of which slightly differs from the common ones.<n>Our approach visually demonstrates the efficacy of generated images and outperforms current state-of-the-art font generation methods.
arXiv Detail & Related papers (2025-01-14T12:15:49Z) - JoyType: A Robust Design for Multilingual Visual Text Creation [14.441897362967344]
We introduce a novel approach for multilingual visual text creation, named JoyType.
JoyType is designed to maintain the font style of text during the image generation process.
Our evaluations, based on both visual and accuracy metrics, demonstrate that JoyType significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-26T04:23:17Z) - DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK Character Generation [1.0044057719679087]
We propose a novel diffusion method for generating glyphs in a targeted style from a single conditioned, standard glyph form.
Our approach shows remarkable zero-shot generalization capabilities for non-CJK but Chinese-inspired scripts.
In summary, our proposed method opens the door to high-quality, generative model-assisted font creation for CJK characters.
arXiv Detail & Related papers (2024-04-08T05:58:07Z) - FontDiffuser: One-Shot Font Generation via Denoising Diffusion with
Multi-Scale Content Aggregation and Style Contrastive Learning [45.696909070215476]
FontDiffuser is a diffusion-based image-to-image one-shot font generation method.
It consistently excels on complex characters and large style changes compared to previous methods.
arXiv Detail & Related papers (2023-12-19T13:23:20Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - DGFont++: Robust Deformable Generative Networks for Unsupervised Font
Generation [19.473023811252116]
We propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++)
To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently.
Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-30T14:35:10Z) - Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font.
The proposed model aims to generate the entire font library by giving only one sample as the reference.
The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z) - DG-Font: Deformable Generative Networks for Unsupervised Font Generation [14.178381391124036]
We propose novel deformable generative networks for unsupervised font generation (DGFont)
We introduce a feature deformation skip connection (FDSC) which predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level feature maps from the content encoder.
Experiments demonstrate that our model generates characters in higher quality than state-of-art methods.
arXiv Detail & Related papers (2021-04-07T11:32:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.