Related papers: ScriptViT: Vision Transformer-Based Personalized Handwriting Generation

ScriptViT: Vision Transformer-Based Personalized Handwriting Generation

URL: http://arxiv.org/abs/2511.18307v1
Date: Sun, 23 Nov 2025 06:38:23 GMT
Title: ScriptViT: Vision Transformer-Based Personalized Handwriting Generation
Authors: Sajjan Acharya, Rajendra Baskota,
Abstract summary: styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style.<n>We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images.<n>We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Styled handwriting generation aims to synthesize handwritten text that looks both realistic and aligned with a specific writer's style. While recent approaches involving GAN, transformer and diffusion-based models have made progress, they often struggle to capture the full spectrum of writer-specific attributes, particularly global stylistic patterns that span long-range spatial dependencies. As a result, capturing subtle writer-specific traits such as consistent slant, curvature or stroke pressure, while keeping the generated text accurate is still an open problem. In this work, we present a unified framework designed to address these limitations. We introduce a Vision Transformer-based style encoder that learns global stylistic patterns from multiple reference images, allowing the model to better represent long-range structural characteristics of handwriting. We then integrate these style cues with the target text using a cross-attention mechanism, enabling the system to produce handwritten images that more faithfully reflect the intended style. To make the process more interpretable, we utilize Salient Stroke Attention Analysis (SSAA), which reveals the stroke-level features the model focuses on during style transfer. Together, these components lead to handwriting synthesis that is not only more stylistically coherent, but also easier to understand and analyze.

Related papers

Autoregressive Styled Text Image Generation, but Make it Reliable [51.09340470015673]
This work is dedicated to developing strategies that reproduce the characteristics of a given writer, with promising results in terms of style fidelity and generalization achieved by the recently proposed Autoregressive Transformer paradigm for HTG.<n>In this work, we rethink the autoregressive by framing HTG as a multimodal prompt-conditioned generation task, tackling the content controllability issues by introducing special input tokens for better alignment with the visual ones.
arXiv Detail & Related papers (2025-10-27T11:54:23Z)
Calligrapher: Freestyle Text Image Customization [72.71919410487881]
Calligrapher is a novel diffusion-based framework that integrates advanced text customization with artistic typography.<n>By automating high-quality, visually consistent typography, Calligrapher surpasses traditional models.
arXiv Detail & Related papers (2025-06-30T17:59:06Z)
WriteViT: Handwritten Text Generation with Vision Transformer [7.10052009802944]
We introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT)<n>WriteViT produces high-quality, style-consistent handwriting while maintaining strong recognition performance in low-resource scenarios.<n>These results highlight the promise of transformer-based designs for multilingual handwriting generation and efficient style adaptation.
arXiv Detail & Related papers (2025-05-19T15:17:53Z)
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing [23.64662356622401]
We present GlyphMastero, a specialized glyph encoder designed to guide the latent diffusion model for generating texts with stroke-level precision.<n>Our method achieves an 18.02% improvement in sentence accuracy over the state-of-the-art scene text editing baseline.
arXiv Detail & Related papers (2025-05-08T03:11:58Z)
Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench) MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z)
Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models [13.41869920770082]
We introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.
arXiv Detail & Related papers (2024-09-01T17:33:31Z)
ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z)
ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer [60.6863849241972]
We learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics.
arXiv Detail & Related papers (2023-04-12T10:33:18Z)
Handwritten Text Generation from Visual Archetypes [25.951540903019467]
We devise a Transformer-based model for Few-Shot styled handwritten text generation. We obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset.
arXiv Detail & Related papers (2023-03-27T14:58:20Z)
Handwriting Transformers [98.3964093654716]
We propose a transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement and global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated.
arXiv Detail & Related papers (2021-04-08T17:59:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.