Multi-channel Transformers for Multi-articulatory Sign Language
Translation
- URL: http://arxiv.org/abs/2009.00299v1
- Date: Tue, 1 Sep 2020 09:10:55 GMT
- Title: Multi-channel Transformers for Multi-articulatory Sign Language
Translation
- Authors: Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, Richard Bowden
- Abstract summary: We tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture.
The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself.
- Score: 59.38247587308604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign languages use multiple asynchronous information channels (articulators),
not just the hands but also the face and body, which computational approaches
often ignore. In this paper we tackle the multi-articulatory sign language
translation task and propose a novel multi-channel transformer architecture.
The proposed architecture allows both the inter and intra contextual
relationships between different sign articulators to be modelled within the
transformer network itself, while also maintaining channel specific
information. We evaluate our approach on the RWTH-PHOENIX-Weather-2014T dataset
and report competitive translation performance. Importantly, we overcome the
reliance on gloss annotations which underpin other state-of-the-art approaches,
thereby removing future need for expensive curated datasets.
Related papers
- Leveraging Timestamp Information for Serialized Joint Streaming
Recognition and Translation [51.399695200838586]
We propose a streaming Transformer-Transducer (T-T) model able to jointly produce many-to-one and one-to-many transcription and translation using a single decoder.
Experiments on it,es,de->en prove the effectiveness of our approach, enabling the generation of one-to-many joint outputs with a single decoder for the first time.
arXiv Detail & Related papers (2023-10-23T11:00:27Z) - Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens [15.283483438956264]
We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
arXiv Detail & Related papers (2023-05-19T09:36:48Z) - Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks [16.8212280804151]
Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer.
We propose novel methods for using language-specificworks, which control cross-lingual parameter sharing.
We combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer.
arXiv Detail & Related papers (2022-10-31T19:23:33Z) - Hierarchical Local-Global Transformer for Temporal Sentence Grounding [58.247592985849124]
This paper studies the multimedia problem of temporal sentence grounding.
It aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query.
arXiv Detail & Related papers (2022-08-31T14:16:56Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Progressive Transformers for End-to-End Sign Language Production [43.45785951443149]
The goal of automatic Sign Language Production (SLP) is to translate spoken language to a continuous stream of sign language video.
Previous work on predominantly isolated SLP has shown the need for architectures that are better suited to the continuous domain of full sign sequences.
We propose Progressive Transformers, a novel architecture that can translate from discrete spoken language sentences to continuous 3D skeleton pose outputs representing sign language.
arXiv Detail & Related papers (2020-04-30T15:20:25Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.