Related papers: Layout Stroke Imitation: A Layout Guided Handwriting Stroke Generation for Style Imitation with Diffusion Model

Layout Stroke Imitation: A Layout Guided Handwriting Stroke Generation for Style Imitation with Diffusion Model

URL: http://arxiv.org/abs/2509.15678v1
Date: Fri, 19 Sep 2025 06:53:17 GMT
Title: Layout Stroke Imitation: A Layout Guided Handwriting Stroke Generation for Style Imitation with Diffusion Model
Authors: Sidra Hanif, Longin Jan Latecki,
Abstract summary: This work proposes multi-scale attention features for calligraphic style imitation.<n>These multi-scale feature embeddings highlight the local and global style features.<n> Secondly, we propose a conditional diffusion model to predict strokes in contrast to previous work, which directly generated style images.
Score: 8.457315999229907
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Handwriting stroke generation is crucial for improving the performance of tasks such as handwriting recognition and writers order recovery. In handwriting stroke generation, it is significantly important to imitate the sample calligraphic style. The previous studies have suggested utilizing the calligraphic features of the handwriting. However, they had not considered word spacing (word layout) as an explicit handwriting feature, which results in inconsistent word spacing for style imitation. Firstly, this work proposes multi-scale attention features for calligraphic style imitation. These multi-scale feature embeddings highlight the local and global style features. Secondly, we propose to include the words layout, which facilitates word spacing for handwriting stroke generation. Moreover, we propose a conditional diffusion model to predict strokes in contrast to previous work, which directly generated style images. Stroke generation provides additional temporal coordinate information, which is lacking in image generation. Hence, our proposed conditional diffusion model for stroke generation is guided by calligraphic style and word layout for better handwriting imitation and stroke generation in a calligraphic style. Our experimentation shows that the proposed diffusion model outperforms the current state-of-the-art stroke generation and is competitive with recent image generation networks.

Related papers

ScriptViT: Vision Transformer-Based Personalized Handwriting Generation [0.0]
styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style.<n>We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images.<n>We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style.
arXiv Detail & Related papers (2025-11-23T06:38:23Z)
Autoregressive Styled Text Image Generation, but Make it Reliable [51.09340470015673]
This work is dedicated to developing strategies that reproduce the characteristics of a given writer, with promising results in terms of style fidelity and generalization achieved by the recently proposed Autoregressive Transformer paradigm for HTG.<n>In this work, we rethink the autoregressive by framing HTG as a multimodal prompt-conditioned generation task, tackling the content controllability issues by introducing special input tokens for better alignment with the visual ones.
arXiv Detail & Related papers (2025-10-27T11:54:23Z)
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation [41.08176249345279]
DiffInk is the first latent diffusion Transformer framework for full-line handwriting generation.<n>We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses.<n>We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories.
arXiv Detail & Related papers (2025-09-28T03:58:15Z)
Calligrapher: Freestyle Text Image Customization [72.71919410487881]
Calligrapher is a novel diffusion-based framework that integrates advanced text customization with artistic typography.<n>By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models.
arXiv Detail & Related papers (2025-06-30T17:59:06Z)
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models. Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z)
Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models [13.41869920770082]
We introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.
arXiv Detail & Related papers (2024-09-01T17:33:31Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation. We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network. Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z)
Towards Diverse and Consistent Typography Generation [15.300255326619203]
We formulate typography generation as a fine-grained attribute generation for multiple text elements. We build an autoregressive model to generate diverse typography that matches the input design context.
arXiv Detail & Related papers (2023-09-05T10:08:11Z)
GlyphDiffusion: Text Generation as Image Generation [100.98428068214736]
We propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a glyph image containing visual language content. Our model also makes significant improvements compared to the recent diffusion model.
arXiv Detail & Related papers (2023-04-25T02:14:44Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
Generating Handwriting via Decoupled Style Descriptors [28.31500214381889]
We introduce the Decoupled Style Descriptor model for handwriting. It factors both character- and writer-level styles and allows our model to represent an overall greater space of styles. In experiments, our generated results were preferred over a state of the art baseline method 88% of the time.
arXiv Detail & Related papers (2020-08-26T02:52:48Z)
Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues. We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.