Related papers: Progressive Transformers for End-to-End Sign Language Production

Progressive Transformers for End-to-End Sign Language Production

URL: http://arxiv.org/abs/2004.14874v2
Date: Mon, 20 Jul 2020 10:20:03 GMT
Title: Progressive Transformers for End-to-End Sign Language Production
Authors: Ben Saunders and Necati Cihan Camgoz and Richard Bowden
Abstract summary: The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video. Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences. We propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language.
Score: 43.45785951443149
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video at a level comparable to a human translator. If this was achievable, then it would revolutionise Deaf hearing communications. Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences. In this paper, we propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language. We present two model configurations, an end-to-end network that produces sign direct from text and a stacked network that utilises a gloss intermediary. Our transformer network architecture introduces a counter that enables continuous sequence generation at training and inference. We also provide several data augmentation processes to overcome the problem of drift and improve the performance of SLP models. We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging RWTH-PHOENIX-Weather-2014T(PHOENIX14T) dataset and setting baselines for future research.

Related papers

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
A Data-Driven Representation for Sign Language Production [26.520016084139964]
Sign Language Production aims to automatically translate spoken language sentences into continuous sequences of sign language. Current state-of-the-art approaches rely on scarce linguistic resources to work. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem.
arXiv Detail & Related papers (2024-04-17T15:52:38Z)
Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z)
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences. We tackle large-scale SLP by learning to co-articulate between dictionary signs. We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z)
SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation. We propose SimulSLT, the first end-to-end simultaneous sign language translation model. SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z)
Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks [37.679114155300084]
Sign Language Production (SLP) must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. We propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D sign pose sequences. We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a Mixture Density Network (MDN) formulation to produce realistic and expressive sign pose sequences.
arXiv Detail & Related papers (2021-03-11T22:11:17Z)
Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video [43.45785951443149]
To be truly understandable by Deaf communities, an automatic Sign Language Production system must generate a photo-realistic signer. We propose SignGAN, the first SLP model to produce photo-realistic continuous sign language videos directly from spoken language. A pose-conditioned human synthesis model is then introduced to generate a photo-realistic sign language video from the skeletal pose sequence.
arXiv Detail & Related papers (2020-11-19T14:31:06Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Multi-channel Transformers for Multi-articulatory Sign Language Translation [59.38247587308604]
We tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself.
arXiv Detail & Related papers (2020-09-01T09:10:55Z)
Adversarial Training for Multi-Channel Sign Language Production [43.45785951443149]
We propose an Adversarial Multi-Channel approach to Sign Language Production. We frame sign production as a minimax game between a transformer-based Generator and a conditional Discriminator. Our adversarial discriminator evaluates the realism of sign production conditioned on the source text, pushing the generator towards a realistic and articulate output.
arXiv Detail & Related papers (2020-08-27T23:05:54Z)
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.