Keypoint based Sign Language Translation without Glosses
- URL: http://arxiv.org/abs/2204.10511v1
- Date: Fri, 22 Apr 2022 05:37:56 GMT
- Title: Keypoint based Sign Language Translation without Glosses
- Authors: Youngmin Kim, Minji Kwak, Dain Lee, Yeongeun Kim, Hyeongboo Baek
- Abstract summary: We propose a new keypoint normalization method for performing translation based on the skeleton point of the signer.
It contributed to performance improvement by a customized normalization method depending on the body parts.
Our method can be applied to various datasets in a way that can be applied to datasets without glosses.
- Score: 7.240731862549344
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sign Language Translation (SLT) is a task that has not been studied
relatively much compared to the study of Sign Language Recognition (SLR).
However, the SLR is a study that recognizes the unique grammar of sign
language, which is different from the spoken language and has a problem that
non-disabled people cannot easily interpret. So, we're going to solve the
problem of translating directly spoken language in sign language video. To this
end, we propose a new keypoint normalization method for performing translation
based on the skeleton point of the signer and robustly normalizing these points
in sign language translation. It contributed to performance improvement by a
customized normalization method depending on the body parts. In addition, we
propose a stochastic frame selection method that enables frame augmentation and
sampling at the same time. Finally, it is translated into the spoken language
through an Attention-based translation model. Our method can be applied to
various datasets in a way that can be applied to datasets without glosses. In
addition, quantitative experimental evaluation proved the excellence of our
method.
Related papers
- Diverse Sign Language Translation [27.457810402402387]
We introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos.
We employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets.
Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations.
arXiv Detail & Related papers (2024-10-25T14:28:20Z) - Reconsidering Sentence-Level Sign Language Translation [2.099922236065961]
We show that for 33% of sentences in our sample, our fluent Deaf signer annotators were only able to understand key parts of the clip in light of discourse-level context.
These results underscore the importance of understanding and sanity checking examples when adapting machine learning to new domains.
arXiv Detail & Related papers (2024-06-16T19:19:54Z) - SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language.
Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer.
Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z) - Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Building Korean Sign Language Augmentation (KoSLA) Corpus with Data
Augmentation Technique [0.0]
We present an efficient framework of corpus for sign language translation.
By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus.
arXiv Detail & Related papers (2022-07-12T02:12:36Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.