Online Gesture Recognition using Transformer and Natural Language
Processing
- URL: http://arxiv.org/abs/2305.03407v1
- Date: Fri, 5 May 2023 10:17:22 GMT
- Title: Online Gesture Recognition using Transformer and Natural Language
Processing
- Authors: G.C.M. Silvestre, F. Balado, O. Akinremi and M. Ramo
- Abstract summary: Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Transformer architecture is shown to provide a powerful machine
transduction framework for online handwritten gestures corresponding to glyph
strokes of natural language sentences. The attention mechanism is successfully
used to create latent representations of an end-to-end encoder-decoder model,
solving multi-level segmentation while also learning some language features and
syntax rules. The additional use of a large decoding space with some learned
Byte-Pair-Encoding (BPE) is shown to provide robustness to ablated inputs and
syntax rules. The encoder stack was directly fed with spatio-temporal data
tokens potentially forming an infinitely large input vocabulary, an approach
that finds applications beyond that of this work. Encoder transfer learning
capabilities is also demonstrated on several languages resulting in faster
optimisation and shared parameters. A new supervised dataset of online
handwriting gestures suitable for generic handwriting recognition tasks was
used to successfully train a small transformer model to an average normalised
Levenshtein accuracy of 96% on English or German sentences and 94% in French.
Related papers
- T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions [0.0]
Transformer architecture is shown to provide an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes.
The attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions.
For the first time, the encoder is fed with unseen online-temporal data tokens potentially forming an infinitely large vocabulary.
arXiv Detail & Related papers (2022-11-04T17:55:55Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Sentence Bottleneck Autoencoders from Transformer Language Models [53.350633961266375]
We build a sentence-level autoencoder from a pretrained, frozen transformer language model.
We adapt the masked language modeling objective as a generative, denoising one, while only training a sentence bottleneck and a single-layer modified transformer decoder.
We demonstrate that the sentence representations discovered by our model achieve better quality than previous methods that extract representations from pretrained transformers on text similarity tasks, style transfer, and single-sentence classification tasks in the GLUE benchmark, while using fewer parameters than large pretrained models.
arXiv Detail & Related papers (2021-08-31T19:39:55Z) - Multilingual Speech Recognition for Low-Resource Indian Languages using
Multi-Task conformer [4.594159253008448]
We propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages.
We use a phoneme decoder for the phoneme recognition task and a grapheme decoder to predict grapheme sequence.
Our proposed approach can obtain significant improvement over previous approaches.
arXiv Detail & Related papers (2021-08-22T09:32:15Z) - A Dual-Decoder Conformer for Multilingual Speech Recognition [4.594159253008448]
This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages.
We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information.
Our experiments show that we can obtain a significant reduction in WER over the baseline approaches.
arXiv Detail & Related papers (2021-08-22T09:22:28Z) - Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in
Multitask End-to-End Speech Translation [127.54315184545796]
Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.
We propose to improve the multitask ST model by utilizing word embedding as the intermediate.
arXiv Detail & Related papers (2020-05-21T14:22:35Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.