Sampling and Ranking for Digital Ink Generation on a tight computational
budget
- URL: http://arxiv.org/abs/2306.03103v1
- Date: Fri, 2 Jun 2023 09:55:15 GMT
- Title: Sampling and Ranking for Digital Ink Generation on a tight computational
budget
- Authors: Andrei Afonin, Andrii Maksai, Aleksandr Timofeev, and Claudiu Musat
- Abstract summary: We study ways to maximize the quality of the output of a trained digital ink generative model.
We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain.
- Score: 69.15275423815461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Digital ink (online handwriting) generation has a number of potential
applications for creating user-visible content, such as handwriting
autocompletion, spelling correction, and beautification. Writing is personal
and usually the processing is done on-device. Ink generative models thus need
to produce high quality content quickly, in a resource constrained environment.
In this work, we study ways to maximize the quality of the output of a
trained digital ink generative model, while staying within an inference time
budget. We use and compare the effect of multiple sampling and ranking
techniques, in the first ablation study of its kind in the digital ink domain.
We confirm our findings on multiple datasets - writing in English and
Vietnamese, as well as mathematical formulas - using two model types and two
common ink data representations. In all combinations, we report a meaningful
improvement in the recognizability of the synthetic inks, in some cases more
than halving the character error rate metric, and describe a way to select the
optimal combination of sampling and ranking techniques for any given
computational budget.
Related papers
- InkSight: Offline-to-Online Handwriting Conversion by Learning to Read
and Write [7.827729986700937]
InkSight aims to empower physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting)
Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples.
Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image.
arXiv Detail & Related papers (2024-02-08T16:41:41Z) - DSS: Synthesizing long Digital Ink using Data augmentation, Style
encoding and Split generation [47.90135553071684]
We show that the commonly used models for this task fail to generalize to long-form data.
These methods use contrastive learning technique and are tailored specifically for the handwriting domain.
arXiv Detail & Related papers (2023-11-29T16:33:19Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Improving Accuracy and Explainability of Online Handwriting Recognition [0.9176056742068814]
We develop handwriting recognition models on the OnHW-chars dataset and improve the accuracy of previous models.
Our results are verifiable and reproducible via the provided public repository.
arXiv Detail & Related papers (2022-09-14T21:28:14Z) - Quark: Controllable Text Generation with Reinforced Unlearning [68.07749519374089]
Large-scale language models often learn behaviors that are misaligned with user expectations.
We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property.
For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods.
arXiv Detail & Related papers (2022-05-26T21:11:51Z) - Structural Information Preserving for Graph-to-Text Generation [59.00642847499138]
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs.
We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information.
Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
arXiv Detail & Related papers (2021-02-12T20:09:01Z) - Offline Handwritten Chinese Text Recognition with Convolutional Neural
Networks [5.984124397831814]
In this paper, we build the models using only the convolutional neural networks and use CTC as the loss function.
We achieve 6.81% character error rate (CER) on the ICDAR 2013 competition set, which is the best published result without language model correction.
arXiv Detail & Related papers (2020-06-28T14:34:38Z) - FCN+RL: A Fully Convolutional Network followed by Refinement Layers to
Offline Handwritten Signature Segmentation [3.3144312096837325]
We propose an approach to locate and extract the pixels of handwritten signatures on identification documents.
The technique is based on a fully convolutional encoder-decoder network combined with a block of refinement layers for the alpha channel of the predicted image.
arXiv Detail & Related papers (2020-05-28T18:47:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.