InkSight: Offline-to-Online Handwriting Conversion by Learning to Read
and Write
- URL: http://arxiv.org/abs/2402.05804v2
- Date: Wed, 21 Feb 2024 00:19:06 GMT
- Title: InkSight: Offline-to-Online Handwriting Conversion by Learning to Read
and Write
- Authors: Blagoj Mitrevski, Arina Rak, Julian Schnitzler, Chengkun Li, Andrii
Maksai, Jesse Berent, Claudiu Musat
- Abstract summary: InkSight aims to empower physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting)
Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples.
Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image.
- Score: 7.827729986700937
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Digital note-taking is gaining popularity, offering a durable, editable, and
easily indexable way of storing notes in the vectorized form, known as digital
ink. However, a substantial gap remains between this way of note-taking and
traditional pen-and-paper note-taking, a practice still favored by a vast
majority. Our work, InkSight, aims to bridge the gap by empowering physical
note-takers to effortlessly convert their work (offline handwriting) to digital
ink (online handwriting), a process we refer to as Derendering. Prior research
on the topic has focused on the geometric properties of images, resulting in
limited generalization beyond their training domains. Our approach combines
reading and writing priors, allowing training a model in the absence of large
amounts of paired samples, which are difficult to obtain. To our knowledge,
this is the first work that effectively derenders handwritten text in arbitrary
photos with diverse visual characteristics and backgrounds. Furthermore, it
generalizes beyond its training domain into simple sketches. Our human
evaluation reveals that 87% of the samples produced by our model on the
challenging HierText dataset are considered as a valid tracing of the input
image and 67% look like a pen trajectory traced by a human. Interactive
visualizations of 100 word-level model outputs for each of the three public
datasets are available in our Hugging Face space:
https://huggingface.co/spaces/Derendering/Model-Output-Playground. Model
release is in progress.
Related papers
- Sampling and Ranking for Digital Ink Generation on a tight computational
budget [69.15275423815461]
We study ways to maximize the quality of the output of a trained digital ink generative model.
We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.
arXiv Detail & Related papers (2023-06-02T09:55:15Z) - DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion [10.75789076591325]
We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts.
Our approach utilizes large language models to bridge texts and visual images for stylization and build an unsupervised generative model.
arXiv Detail & Related papers (2023-03-16T19:12:52Z) - Character-Aware Models Improve Visual Text Rendering [57.19915686282047]
Current image generation models struggle to reliably produce well-formed visual text.
Character-aware models provide large gains on a novel spelling task.
Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words.
arXiv Detail & Related papers (2022-12-20T18:59:23Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Drawing out of Distribution with Neuro-Symbolic Generative Models [49.79371715591122]
Drawing out of Distribution is a neuro-symbolic generative model of stroke-based drawing.
DooD operates directly on images, requires no supervision or expensive test-time inference.
We evaluate DooD on its ability to generalise across both data and tasks.
arXiv Detail & Related papers (2022-06-03T21:40:22Z) - Learning to Prompt for Vision-Language Models [82.25005817904027]
Vision-language pre-training has emerged as a promising alternative for representation learning.
It shifts from the tradition of using images and discrete labels for learning a fixed set of weights, seen as visual concepts, to aligning images and raw text for two separate encoders.
Such a paradigm benefits from a broader source of supervision and allows zero-shot transfer to downstream tasks.
arXiv Detail & Related papers (2021-09-02T17:57:31Z) - Few Shots Is All You Need: A Progressive Few Shot Learning Approach for
Low Resource Handwriting Recognition [1.7491858164568674]
We propose a few-shot learning-based handwriting recognition approach that significantly reduces the human labor annotation process.
Our model detects all symbols of a given alphabet in a textline image, then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols.
Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach.
arXiv Detail & Related papers (2021-07-21T13:18:21Z) - CharacterGAN: Few-Shot Keypoint Character Animation and Reposing [64.19520387536741]
We introduce CharacterGAN, a generative model that can be trained on only a few samples of a given character.
Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback.
We show that our approach outperforms recent baselines and creates realistic animations for diverse characters.
arXiv Detail & Related papers (2021-02-05T12:38:15Z) - Spectral Graph-based Features for Recognition of Handwritten Characters:
A Case Study on Handwritten Devanagari Numerals [0.0]
We propose an approach that exploits the robust graph representation and spectral graph embedding concept to represent handwritten characters.
For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset.
arXiv Detail & Related papers (2020-07-07T08:40:08Z) - ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation [0.9542023122304099]
We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images.
ScrabbleGAN relies on a novel generative model which can generate images of words with an arbitrary length.
arXiv Detail & Related papers (2020-03-23T21:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.