Learning from What is Already Out There: Few-shot Sign Language
Recognition with Online Dictionaries
- URL: http://arxiv.org/abs/2301.03769v1
- Date: Tue, 10 Jan 2023 03:21:01 GMT
- Title: Learning from What is Already Out There: Few-shot Sign Language
Recognition with Online Dictionaries
- Authors: Maty\'a\v{s} Boh\'a\v{c}ek and Marek Hr\'uz
- Abstract summary: We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos.
We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Today's sign language recognition models require large training corpora of
laboratory-like videos, whose collection involves an extensive workforce and
financial resources. As a result, only a handful of such systems are publicly
available, not to mention their limited localization capabilities for
less-populated sign languages. Utilizing online text-to-video dictionaries,
which inherently hold annotated data of various attributes and sign languages,
and training models in a few-shot fashion hence poses a promising path for the
democratization of this technology. In this work, we collect and open-source
the UWB-SL-Wild few-shot dataset, the first of its kind training resource
consisting of dictionary-scraped videos. This dataset represents the actual
distribution and characteristics of available online sign language data. We
select glosses that directly overlap with the already existing datasets
WLASL100 and ASLLVD and share their class mappings to allow for transfer
learning experiments. Apart from providing baseline results on a pose-based
architecture, we introduce a novel approach to training sign language
recognition models in a few-shot scenario, resulting in state-of-the-art
results on ASLLVD-Skeleton and ASLLVD-Skeleton-20 datasets with top-1 accuracy
of $30.97~\%$ and $95.45~\%$, respectively.
Related papers
- SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content.
Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input.
SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets [2.512406961007489]
We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches.
Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.
arXiv Detail & Related papers (2024-03-21T16:36:40Z) - Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - Towards the extraction of robust sign embeddings for low resource sign
language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - OpenHands: Making Sign Language Recognition Accessible with Pose-based
Pretrained Models across Languages [2.625144209319538]
We introduce OpenHands, a library where we take four key ideas from the NLP community for low-resource languages and apply them to sign languages for word-level recognition.
First, we propose using pose extracted through pretrained models as the standard modality of data to reduce training time and enable efficient inference.
Second, we train and release checkpoints of 4 pose-based isolated sign language recognition models across all 6 languages, providing baselines and ready checkpoints for deployment.
Third, to address the lack of labelled data, we propose self-supervised pretraining on unlabelled data.
arXiv Detail & Related papers (2021-10-12T10:33:02Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.