The Cursive Transformer
- URL: http://arxiv.org/abs/2504.00051v1
- Date: Mon, 31 Mar 2025 03:22:27 GMT
- Title: The Cursive Transformer
- Authors: Sam Greydanus, Zachary Wimpee,
- Abstract summary: We introduce a novel tokenization scheme that converts pen stroke offsets to polar coordinates, discretizes them into bins, and then turns them into sequences of tokens.<n>With just 3,500 handwritten words and a few simple data augmentations, we are able to train a model that can generate realistic cursive handwriting.
- Score: 0.6138671548064355
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Transformers trained on tokenized text, audio, and images can generate high-quality autoregressive samples. But handwriting data, represented as sequences of pen coordinates, remains underexplored. We introduce a novel tokenization scheme that converts pen stroke offsets to polar coordinates, discretizes them into bins, and then turns them into sequences of tokens with which to train a standard GPT model. This allows us to capture complex stroke distributions without using any specialized architectures (eg. the mixture density network or the self-advancing ASCII attention head from Graves 2014). With just 3,500 handwritten words and a few simple data augmentations, we are able to train a model that can generate realistic cursive handwriting. Our approach is simpler and more performant than previous RNN-based methods.
Related papers
- A Transformer Based Handwriting Recognition System Jointly Using Online and Offline Features [8.419663258260671]
We introduce an end-to-end network that performs early fusion of offline images and online stroke data.<n>Our approach achieves state-of-the-art accuracy, exceeding previous bests by up to 1%.
arXiv Detail & Related papers (2025-06-25T08:58:47Z) - GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing [23.64662356622401]
We present GlyphMastero, a specialized glyph encoder designed to guide the latent diffusion model for generating texts with stroke-level precision.<n>Our method achieves an 18.02% improvement in sentence accuracy over the state-of-the-art scene text editing baseline.
arXiv Detail & Related papers (2025-05-08T03:11:58Z) - General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)<n>Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.<n>We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - GEC-DePenD: Non-Autoregressive Grammatical Error Correction with
Decoupled Permutation and Decoding [52.14832976759585]
Grammatical error correction (GEC) is an important NLP task that is usually solved with autoregressive sequence-to-sequence models.
We propose a novel non-autoregressive approach to GEC that decouples the architecture into a permutation network.
We show that the resulting network improves over previously known non-autoregressive methods for GEC.
arXiv Detail & Related papers (2023-11-14T14:24:36Z) - Momentum Calibration for Text Generation [86.58432361938806]
We propose MoCa (bf Momentum bf Calibration) for text generation.
MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search.
arXiv Detail & Related papers (2022-12-08T13:12:10Z) - GENIUS: Sketch-based Language Model Pre-training via Extreme and
Selective Masking for Text Generation and Augmentation [76.7772833556714]
We introduce GENIUS: a conditional text generation model using sketches as input.
GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective.
We show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-11-18T16:39:45Z) - Data Incubation -- Synthesizing Missing Data for Handwriting Recognition [16.62493361545184]
We show how a generative model can be used to build a better recognizer through the control of content and style.
We use the framework to optimize data synthesis and demonstrate significant improvement on handwriting recognition over a model trained on real data only.
arXiv Detail & Related papers (2021-10-13T21:28:18Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z) - Learning to Look Inside: Augmenting Token-Based Encoders with
Character-Level Information [29.633735942273997]
XRayEmb is a method for retrofitting existing token-based models with character-level information.
We show that incorporating XRayEmb's learned vectors into sequences of pre-trained token embeddings helps performance on both autoregressive and masked pre-trained transformer architectures.
arXiv Detail & Related papers (2021-08-01T08:09:26Z) - Word Shape Matters: Robust Machine Translation with Visual Embedding [78.96234298075389]
We introduce a new encoding of the input symbols for character-level NLP models.
It encodes the shape of each character through the images depicting the letters when printed.
We name this new strategy visual embedding and it is expected to improve the robustness of NLP models.
arXiv Detail & Related papers (2020-10-20T04:08:03Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation [0.9542023122304099]
We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images.
ScrabbleGAN relies on a novel generative model which can generate images of words with an arbitrary length.
arXiv Detail & Related papers (2020-03-23T21:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.