Related papers: Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

URL: http://arxiv.org/abs/2409.00786v1
Date: Sun, 1 Sep 2024 17:33:31 GMT
Title: Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models
Authors: Martin Mayr, Marcel Dreier, Florian Kordon, Mathias Seuret, Jochen Zöllner, Fei Wu, Andreas Maier, Vincent Christlein,
Abstract summary: We introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.
Score: 13.41869920770082
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.

Related papers

ScriptViT: Vision Transformer-Based Personalized Handwriting Generation [0.0]
styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style.<n>We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images.<n>We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style.
arXiv Detail & Related papers (2025-11-23T06:38:23Z)
Stroke2Sketch: Harnessing Stroke Attributes for Training-Free Sketch Generation [54.053878919317526]
Stroke2Sketch is a training-free framework that introduces cross-image stroke attention.<n>We develop adaptive contrast enhancement and semantic-focused attention to reinforce content preservation and foreground emphasis.<n>Stroke2Sketch effectively synthesizes stylistically faithful sketches, outperforming existing methods in expressive stroke control and semantic coherence.
arXiv Detail & Related papers (2025-10-18T03:07:56Z)
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation [41.08176249345279]
DiffInk is the first latent diffusion Transformer framework for full-line handwriting generation.<n>We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses.<n>We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories.
arXiv Detail & Related papers (2025-09-28T03:58:15Z)
Layout Stroke Imitation: A Layout Guided Handwriting Stroke Generation for Style Imitation with Diffusion Model [8.457315999229907]
This work proposes multi-scale attention features for calligraphic style imitation.<n>These multi-scale feature embeddings highlight the local and global style features.<n> Secondly, we propose a conditional diffusion model to predict strokes in contrast to previous work, which directly generated style images.
arXiv Detail & Related papers (2025-09-19T06:53:17Z)
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation [45.10015960618009]
DiffBrush is a novel diffusion-based model for handwritten text-line generation.<n>It excels in both style imitation and content accuracy through two key strategies.<n>Experiments show that DiffBrush excels in generating high-quality text lines.
arXiv Detail & Related papers (2025-08-05T09:34:06Z)
Calligrapher: Freestyle Text Image Customization [72.71919410487881]
Calligrapher is a novel diffusion-based framework that integrates advanced text customization with artistic typography.<n>By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models.
arXiv Detail & Related papers (2025-06-30T17:59:06Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. We introduce DEADiff to address this issue using the following two strategies. DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z)
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment [112.45442468794658]
We propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff. In the coarse semantic re-alignment phase, a novel caption reward is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. The fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view.
arXiv Detail & Related papers (2023-05-31T06:59:21Z)
Content and Style Aware Generation of Text-line Images for Handwriting Recognition [4.301658883577544]
We propose a generative method for handwritten text-line images conditioned on both visual appearance and textual content. Our method is able to produce long text-line samples with diverse handwriting styles.
arXiv Detail & Related papers (2022-04-12T05:52:03Z)
SLOGAN: Handwriting Style Synthesis for Arbitrary-Length and Out-of-Vocabulary Text [35.83345711291558]
We propose a novel method that can synthesize parameterized and controllable handwriting Styles for arbitrary-Length and Out-of-vocabulary text. We embed the text content by providing an easily obtainable printed style image, so that the diversity of the content can be flexibly achieved. Our method can synthesize words that are not included in the training vocabulary and with various new styles.
arXiv Detail & Related papers (2022-02-23T12:13:27Z)
Generating Handwriting via Decoupled Style Descriptors [28.31500214381889]
We introduce the Decoupled Style Descriptor model for handwriting. It factors both character- and writer-level styles and allows our model to represent an overall greater space of styles. In experiments, our generated results were preferred over a state of the art baseline method 88% of the time.
arXiv Detail & Related papers (2020-08-26T02:52:48Z)
Exploring Contextual Word-level Style Relevance for Unsupervised Style Transfer [60.07283363509065]
Unsupervised style transfer aims to change the style of an input sentence while preserving its original content. We propose a novel attentional sequence-to-sequence model that exploits the relevance of each output word to the target style. Experimental results show that our proposed model achieves state-of-the-art performance in terms of both transfer accuracy and content preservation.
arXiv Detail & Related papers (2020-05-05T10:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.