Mixed SIGNals: Sign Language Production via a Mixture of Motion
Primitives
- URL: http://arxiv.org/abs/2107.11317v2
- Date: Mon, 26 Jul 2021 09:13:33 GMT
- Title: Mixed SIGNals: Sign Language Production via a Mixture of Motion
Primitives
- Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
- Abstract summary: Avatar based Sign Language Production (SLP) has traditionally done just this, building up animation from sequences of hand motions, and facial expressions.
We propose splitting the SLP task into two distinct jointly-trained sub-tasks.
The first translation sub-task translates from spoken language to a latent sign language representation, with gloss supervision.
The animation sub-task aims to produce expressive sign language sequences that closely resemble the learnt sign-temporal representation.
- Score: 37.679114155300084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is common practice to represent spoken languages at their phonetic level.
However, for sign languages, this implies breaking motion into its constituent
motion primitives. Avatar based Sign Language Production (SLP) has
traditionally done just this, building up animation from sequences of hand
motions, shapes and facial expressions. However, more recent deep learning
based solutions to SLP have tackled the problem using a single network that
estimates the full skeletal structure.
We propose splitting the SLP task into two distinct jointly-trained
sub-tasks. The first translation sub-task translates from spoken language to a
latent sign language representation, with gloss supervision. Subsequently, the
animation sub-task aims to produce expressive sign language sequences that
closely resemble the learnt spatio-temporal representation. Using a progressive
transformer for the translation sub-task, we propose a novel Mixture of Motion
Primitives (MoMP) architecture for sign language animation. A set of distinct
motion primitives are learnt during training, that can be temporally combined
at inference to animate continuous sign language sequences.
We evaluate on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T)
dataset, presenting extensive ablation studies and showing that MoMP
outperforms baselines in user evaluations. We achieve state-of-the-art back
translation performance with an 11% improvement over competing results.
Importantly, and for the first time, we showcase stronger performance for a
full translation pipeline going from spoken language to sign, than from gloss
to sign.
Related papers
- EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language.
Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer.
Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z) - A Data-Driven Representation for Sign Language Production [26.520016084139964]
Sign Language Production aims to automatically translate spoken language sentences into continuous sequences of sign language.
Current state-of-the-art approaches rely on scarce linguistic resources to work.
This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem.
arXiv Detail & Related papers (2024-04-17T15:52:38Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Changing the Representation: Examining Language Representation for
Neural Sign Language Production [43.45785951443149]
We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline.
We use language models such as BERT and Word2Vec to create better sentence level embeddings.
We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
arXiv Detail & Related papers (2022-09-16T12:45:29Z) - SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation.
We propose SimulSLT, the first end-to-end simultaneous sign language translation model.
SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign
Language Video [43.45785951443149]
To be truly understandable by Deaf communities, an automatic Sign Language Production system must generate a photo-realistic signer.
We propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language.
A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence.
arXiv Detail & Related papers (2020-11-19T14:31:06Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.