Continuous 3D Multi-Channel Sign Language Production via Progressive
Transformers and Mixture Density Networks
- URL: http://arxiv.org/abs/2103.06982v1
- Date: Thu, 11 Mar 2021 22:11:17 GMT
- Title: Continuous 3D Multi-Channel Sign Language Production via Progressive
Transformers and Mixture Density Networks
- Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
- Abstract summary: Sign Language Production (SLP) must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community.
We propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D sign pose sequences.
We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a Mixture Density Network (MDN) formulation to produce realistic and expressive sign pose sequences.
- Score: 37.679114155300084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign languages are multi-channel visual languages, where signers use a
continuous 3D space to communicate.Sign Language Production (SLP), the
automatic translation from spoken to sign languages, must embody both the
continuous articulation and full morphology of sign to be truly understandable
by the Deaf community. Previous deep learning-based SLP works have produced
only a concatenation of isolated signs focusing primarily on the manual
features, leading to a robotic and non-expressive production.
In this work, we propose a novel Progressive Transformer architecture, the
first SLP model to translate from spoken language sentences to continuous 3D
multi-channel sign pose sequences in an end-to-end manner. Our transformer
network architecture introduces a counter decoding that enables variable length
continuous sequence generation by tracking the production progress over time
and predicting the end of sequence. We present extensive data augmentation
techniques to reduce prediction drift, alongside an adversarial training regime
and a Mixture Density Network (MDN) formulation to produce realistic and
expressive sign pose sequences.
We propose a back translation evaluation mechanism for SLP, presenting
benchmark quantitative results on the challenging PHOENIX14T dataset and
setting baselines for future research. We further provide a user evaluation of
our SLP model, to understand the Deaf reception of our sign pose productions.
Related papers
- MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition.
The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched.
The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z) - Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities.
This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z) - Signing at Scale: Learning to Co-Articulate Signs for Large-Scale
Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts.
Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences.
We tackle large-scale SLP by learning to co-articulate between dictionary signs.
We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z) - Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign
Language Video [43.45785951443149]
To be truly understandable by Deaf communities, an automatic Sign Language Production system must generate a photo-realistic signer.
We propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language.
A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence.
arXiv Detail & Related papers (2020-11-19T14:31:06Z) - Adversarial Training for Multi-Channel Sign Language Production [43.45785951443149]
We propose an Adversarial Multi-Channel approach to Sign Language Production.
We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator.
Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output.
arXiv Detail & Related papers (2020-08-27T23:05:54Z) - Progressive Transformers for End-to-End Sign Language Production [43.45785951443149]
The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video.
Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences.
We propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language.
arXiv Detail & Related papers (2020-04-30T15:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.