Towards the extraction of robust sign embeddings for low resource sign
language recognition
- URL: http://arxiv.org/abs/2306.17558v2
- Date: Wed, 16 Aug 2023 08:46:52 GMT
- Title: Towards the extraction of robust sign embeddings for low resource sign
language recognition
- Authors: Mathieu De Coster, Ellen Rushe, Ruth Holmes, Anthony Ventresque, Joni
Dambre
- Abstract summary: We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
- Score: 7.969704867355098
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Isolated Sign Language Recognition (SLR) has mostly been applied on datasets
containing signs executed slowly and clearly by a limited group of signers. In
real-world scenarios, however, we are met with challenging visual conditions,
coarticulated signing, small datasets, and the need for signer independent
models. To tackle this difficult problem, we require a robust feature extractor
to process the sign language videos. One could expect human pose estimators to
be ideal candidates. However, due to a domain mismatch with their training sets
and challenging poses in sign language, they lack robustness on sign language
data and image-based models often still outperform keypoint-based models.
Furthermore, whereas the common practice of transfer learning with image-based
models yields even higher accuracy, keypoint-based models are typically trained
from scratch on every SLR dataset. These factors limit their usefulness for
SLR. From the existing literature, it is also not clear which, if any, pose
estimator performs best for SLR. We compare the three most popular pose
estimators for SLR: OpenPose, MMPose and MediaPipe. We show that through
keypoint normalization, missing keypoint imputation, and learning a pose
embedding, we can obtain significantly better results and enable transfer
learning. We show that keypoint-based embeddings contain cross-lingual
features: they can transfer between sign languages and achieve competitive
performance even when fine-tuning only the classifier layer of an SLR model on
a target sign language. We furthermore achieve better performance using
fine-tuned transferred embeddings than models trained only on the target sign
language. The embeddings can also be learned in a multilingual fashion. The
application of these embeddings could prove particularly useful for low
resource sign languages in the future.
Related papers
- Language Representations Can be What Recommenders Need: Findings and Potentials [57.90679739598295]
We show that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance.
This outcome suggests the possible homomorphism between the advanced language representation space and an effective item representation space for recommendation.
Our findings highlight the connection between language modeling and behavior modeling, which can inspire both natural language processing and recommender system communities.
arXiv Detail & Related papers (2024-07-07T17:05:24Z) - SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language.
Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer.
Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Efficient Spoken Language Recognition via Multilabel Classification [53.662747523872305]
We show that our models obtain competitive results while being orders of magnitude smaller and faster than current state-of-the-art methods.
Our multilabel strategy is more robust to unseen non-target languages compared to multiclass classification.
arXiv Detail & Related papers (2023-06-02T23:04:19Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Signing at Scale: Learning to Co-Articulate Signs for Large-Scale
Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts.
Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences.
We tackle large-scale SLP by learning to co-articulate between dictionary signs.
We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
Language Recognition [94.30084702921529]
Hand gesture serves as a critical role in sign language.
Current deep-learning-based sign language recognition methods may suffer insufficient interpretability.
We introduce the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR.
arXiv Detail & Related papers (2021-10-11T16:18:09Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.