DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
- URL: http://arxiv.org/abs/2509.23624v1
- Date: Sun, 28 Sep 2025 03:58:15 GMT
- Title: DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
- Authors: Wei Pan, Huiguo He, Hiuyi Cheng, Yilin Shi, Lianwen Jin,
- Abstract summary: DiffInk is the first latent diffusion Transformer framework for full-line handwriting generation.<n>We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses.<n>We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories.
- Score: 41.08176249345279
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have advanced text-to-online handwriting generation (TOHG), which aims to synthesize realistic pen trajectories conditioned on textual input and style references. However, most existing methods still primarily focus on character- or word-level generation, resulting in inefficiency and a lack of holistic structural modeling when applied to full text lines. To address these issues, we propose DiffInk, the first latent diffusion Transformer framework for full-line handwriting generation. We first introduce InkVAE, a novel sequential variational autoencoder enhanced with two complementary latent-space regularization losses: (1) an OCR-based loss enforcing glyph-level accuracy, and (2) a style-classification loss preserving writing style. This dual regularization yields a semantically structured latent space where character content and writer styles are effectively disentangled. We then introduce InkDiT, a novel latent diffusion Transformer that integrates target text and reference styles to generate coherent pen trajectories. Experimental results demonstrate that DiffInk outperforms existing state-of-the-art methods in both glyph accuracy and style fidelity, while significantly improving generation efficiency. Code will be made publicly available.
Related papers
- VecGlypher: Unified Vector Glyph Generation with Language Models [49.18215716168074]
VecGlypher generates high-fidelity vector glyphs directly from text descriptions or image exemplars.<n>VecGlypher autoregressively emits SVG path tokens, avoiding intermediates and a target character.
arXiv Detail & Related papers (2026-02-25T00:27:23Z) - DNA: Dual-branch Network with Adaptation for Open-Set Online Handwriting Generation [28.985690380954765]
We introduce our method for online handwriting generation, where the writer's style and the characters generated during testing are unseen during training.<n>We propose a Dual-branch Network with Adaptation (DNA), which comprises an adaptive style branch and an adaptive content branch.<n>Our DNA model is well-suited for the unseen OHG setting, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-11-27T03:30:22Z) - ScriptViT: Vision Transformer-Based Personalized Handwriting Generation [0.0]
styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style.<n>We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images.<n>We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style.
arXiv Detail & Related papers (2025-11-23T06:38:23Z) - Autoregressive Styled Text Image Generation, but Make it Reliable [51.09340470015673]
This work is dedicated to developing strategies that reproduce the characteristics of a given writer, with promising results in terms of style fidelity and generalization achieved by the recently proposed Autoregressive Transformer paradigm for HTG.<n>In this work, we rethink the autoregressive by framing HTG as a multimodal prompt-conditioned generation task, tackling the content controllability issues by introducing special input tokens for better alignment with the visual ones.
arXiv Detail & Related papers (2025-10-27T11:54:23Z) - Dual Orthogonal Guidance for Robust Diffusion-based Handwritten Text Generation [55.35931633405974]
Diffusion-based Handwritten Text Generation (HTG) approaches achieve impressive results frequent, in-vocabulary words observed at training time and on regular styles.<n>They are prone to memorizing training samples and often struggle with style variability and generation clarity.<n>We propose a novel sampling guidance strategy, Dual Orthogonal Guidance (DOG), that leverages a negatively perturbed prompt onto the original prompt.<n> Experimental results on the state-the-art DiffusionPen and One-DM demonstrate that DOG improves both content clarity and variability even for out-of-vocabulary words and challenging writing styles.
arXiv Detail & Related papers (2025-08-23T13:09:19Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - Decoupling Layout from Glyph in Online Chinese Handwriting Generation [6.566541829858544]
We develop a text line layout generator and stylized font synthesizer.<n>The layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively.<n>The font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references.
arXiv Detail & Related papers (2024-10-03T08:46:17Z) - DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [7.398476020996681]
DiffusionPen (DiffPen) is a 5-shot style handwritten text generation approach based on Latent Diffusion Models.
Our approach captures both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples.
Our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.
arXiv Detail & Related papers (2024-09-09T20:58:25Z) - Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models [13.41869920770082]
We introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions.
We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism.
It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.
arXiv Detail & Related papers (2024-09-01T17:33:31Z) - ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style
Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning.
We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles.
We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.