A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions
- URL: http://arxiv.org/abs/2211.02643v1
- Date: Fri, 4 Nov 2022 17:55:55 GMT
- Title: A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions
- Authors: Mirco Ramo and Gu\'enol\'e C.M. Silvestre
- Abstract summary: Transformer architecture is shown to provide an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes.
The attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions.
For the first time, the encoder is fed with unseen online-temporal data tokens potentially forming an infinitely large vocabulary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Transformer architecture is shown to provide a powerful framework as an
end-to-end model for building expression trees from online handwritten gestures
corresponding to glyph strokes. In particular, the attention mechanism was
successfully used to encode, learn and enforce the underlying syntax of
expressions creating latent representations that are correctly decoded to the
exact mathematical expression tree, providing robustness to ablated inputs and
unseen glyphs. For the first time, the encoder is fed with spatio-temporal data
tokens potentially forming an infinitely large vocabulary, which finds
applications beyond that of online gesture recognition. A new supervised
dataset of online handwriting gestures is provided for training models on
generic handwriting recognition tasks and a new metric is proposed for the
evaluation of the syntactic correctness of the output expression trees. A small
Transformer model suitable for edge inference was successfully trained to an
average normalised Levenshtein accuracy of 94%, resulting in valid postfix RPN
tree representation for 94% of predictions.
Related papers
- On Eliciting Syntax from Language Models via Hashing [19.872554909401316]
Unsupervised parsing aims to infer syntactic structure from raw text.
In this paper, we explore the possibility of leveraging this capability to deduce parsing trees from raw text.
We show that our method is effective and efficient enough to acquire high-quality parsing trees from pre-trained language models at a low cost.
arXiv Detail & Related papers (2024-10-05T08:06:19Z) - PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer [51.260384040953326]
Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios.
We propose a position forest transformer (PosFormer) for HMER, which jointly optimize two tasks: expression recognition and position recognition.
PosFormer consistently outperforms the state-of-the-art methods 2.03%/1.22%/2, 1.83%, and 4.62% gains on datasets.
arXiv Detail & Related papers (2024-07-10T15:42:58Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Online Gesture Recognition using Transformer and Natural Language
Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z) - Syntax-Aware Network for Handwritten Mathematical Expression Recognition [53.130826547287626]
Handwritten mathematical expression recognition (HMER) is a challenging task that has many potential applications.
Recent methods for HMER have achieved outstanding performance with an encoder-decoder architecture.
We propose a simple and efficient method for HMER, which is the first to incorporate syntax information into an encoder-decoder network.
arXiv Detail & Related papers (2022-03-03T09:57:19Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Word Shape Matters: Robust Machine Translation with Visual Embedding [78.96234298075389]
We introduce a new encoding of the input symbols for character-level NLP models.
It encodes the shape of each character through the images depicting the letters when printed.
We name this new strategy visual embedding and it is expected to improve the robustness of NLP models.
arXiv Detail & Related papers (2020-10-20T04:08:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.