Modeling Intensification for Sign Language Generation: A Computational
Approach
- URL: http://arxiv.org/abs/2203.09679v1
- Date: Fri, 18 Mar 2022 01:13:21 GMT
- Title: Modeling Intensification for Sign Language Generation: A Computational
Approach
- Authors: Mert \.Inan, Yang Zhong, Sabit Hassan, Lorna Quandt, Malihe Alikhani
- Abstract summary: End-to-end sign language generation models do not accurately represent the prosody in sign language.
We aim to improve the prosody in generated sign languages by modeling intensification in a data-driven manner.
We find that our efforts in intensification modeling yield better results when evaluated with automatic metrics.
- Score: 13.57903290481737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end sign language generation models do not accurately represent the
prosody in sign language. A lack of temporal and spatial variations leads to
poor-quality generated presentations that confuse human interpreters. In this
paper, we aim to improve the prosody in generated sign languages by modeling
intensification in a data-driven manner. We present different strategies
grounded in linguistics of sign language that inform how intensity modifiers
can be represented in gloss annotations. To employ our strategies, we first
annotate a subset of the benchmark PHOENIX-14T, a German Sign Language dataset,
with different levels of intensification. We then use a supervised intensity
tagger to extend the annotated dataset and obtain labels for the remaining
portion of it. This enhanced dataset is then used to train state-of-the-art
transformer models for sign language generation. We find that our efforts in
intensification modeling yield better results when evaluated with automatic
metrics. Human evaluation also indicates a higher preference of the videos
generated using our model.
Related papers
- Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings [5.257719744958367]
This thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs)
We develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy.
Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations.
arXiv Detail & Related papers (2024-08-28T09:07:30Z) - T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - Augmentation Invariant Discrete Representation for Generative Spoken
Language Modeling [41.733860809136196]
We propose an effective and efficient method to learn robust discrete speech representation for generative spoken language modeling.
The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme.
We additionally evaluate our method on the speech-to-speech translation task, considering Spanish-English and French-English translations, and show the proposed approach outperforms the evaluated baselines.
arXiv Detail & Related papers (2022-09-30T14:15:03Z) - Classification of Phonological Parameters in Sign Languages [0.0]
Linguistic research often breaks down signs into constituent parts to study sign languages.
We show how a single model can be used to recognise the individual phonological parameters within sign languages.
arXiv Detail & Related papers (2022-05-24T13:40:45Z) - Character-level Representations Improve DRS-based Semantic Parsing Even
in the Age of BERT [6.705577865528099]
We combine character-level and contextual language model representations to improve performance on parsing.
For English, these improvements are larger than adding individual sources of linguistic information.
A new method of analysis based on semantic tags demonstrates that the character-level representations improve performance across a subset of selected semantic phenomena.
arXiv Detail & Related papers (2020-11-09T10:24:12Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.