Adversarial Training for Multi-Channel Sign Language Production
- URL: http://arxiv.org/abs/2008.12405v1
- Date: Thu, 27 Aug 2020 23:05:54 GMT
- Title: Adversarial Training for Multi-Channel Sign Language Production
- Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden
- Abstract summary: We propose an Adversarial Multi-Channel approach to Sign Language Production.
We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator.
Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output.
- Score: 43.45785951443149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign Languages are rich multi-channel languages, requiring articulation of
both manual (hands) and non-manual (face and body) features in a precise,
intricate manner. Sign Language Production (SLP), the automatic translation
from spoken to sign languages, must embody this full sign morphology to be
truly understandable by the Deaf community. Previous work has mainly focused on
manual feature production, with an under-articulated output caused by
regression to the mean.
In this paper, we propose an Adversarial Multi-Channel approach to SLP. We
frame sign production as a minimax game between a transformer-based Generator
and a conditional Discriminator. Our adversarial discriminator evaluates the
realism of sign production conditioned on the source text, pushing the
generator towards a realistic and articulate output. Additionally, we fully
encapsulate sign articulators with the inclusion of non-manual features,
producing facial features and mouthing patterns.
We evaluate on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T)
dataset, and report state-of-the art SLP back-translation performance for
manual production. We set new benchmarks for the production of multi-channel
sign to underpin future research into realistic SLP.
Related papers
- MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production [9.065171626657818]
Universal Gloss-level Representation (UniGloR) is a unified and self-supervised solution for both Sign Language Translation and Sign Language Production.
Our results demonstrate UniGloR's effectiveness in the translation and production tasks.
Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications.
arXiv Detail & Related papers (2024-07-03T07:12:36Z) - A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition.
The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched.
The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - Signing at Scale: Learning to Co-Articulate Signs for Large-Scale
Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts.
Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences.
We tackle large-scale SLP by learning to co-articulate between dictionary signs.
We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z) - All You Need In Sign Language Production [50.3955314892191]
Sign language recognition and production need to cope with some critical challenges.
We present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language.
Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented.
arXiv Detail & Related papers (2022-01-05T13:45:09Z) - Continuous 3D Multi-Channel Sign Language Production via Progressive
Transformers and Mixture Density Networks [37.679114155300084]
Sign Language Production (SLP) must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community.
We propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D sign pose sequences.
We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a Mixture Density Network (MDN) formulation to produce realistic and expressive sign pose sequences.
arXiv Detail & Related papers (2021-03-11T22:11:17Z) - Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign
Language Video [43.45785951443149]
To be truly understandable by Deaf communities, an automatic Sign Language Production system must generate a photo-realistic signer.
We propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language.
A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence.
arXiv Detail & Related papers (2020-11-19T14:31:06Z) - Progressive Transformers for End-to-End Sign Language Production [43.45785951443149]
The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video.
Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences.
We propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language.
arXiv Detail & Related papers (2020-04-30T15:20:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.