OpenHands: Making Sign Language Recognition Accessible with Pose-based
Pretrained Models across Languages
- URL: http://arxiv.org/abs/2110.05877v1
- Date: Tue, 12 Oct 2021 10:33:02 GMT
- Title: OpenHands: Making Sign Language Recognition Accessible with Pose-based
Pretrained Models across Languages
- Authors: Prem Selvaraj, Gokul NC, Pratyush Kumar, Mitesh Khapra
- Abstract summary: We introduce OpenHands, a library where we take four key ideas from the NLP community for low-resource languages and apply them to sign languages for word-level recognition.
First, we propose using pose extracted through pretrained models as the standard modality of data to reduce training time and enable efficient inference.
Second, we train and release checkpoints of 4 pose-based isolated sign language recognition models across all 6 languages, providing baselines and ready checkpoints for deployment.
Third, to address the lack of labelled data, we propose self-supervised pretraining on unlabelled data.
- Score: 2.625144209319538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI technologies for Natural Languages have made tremendous progress recently.
However, commensurate progress has not been made on Sign Languages, in
particular, in recognizing signs as individual words or as complete sentences.
We introduce OpenHands, a library where we take four key ideas from the NLP
community for low-resource languages and apply them to sign languages for
word-level recognition. First, we propose using pose extracted through
pretrained models as the standard modality of data to reduce training time and
enable efficient inference, and we release standardized pose datasets for 6
different sign languages - American, Argentinian, Chinese, Greek, Indian, and
Turkish. Second, we train and release checkpoints of 4 pose-based isolated sign
language recognition models across all 6 languages, providing baselines and
ready checkpoints for deployment. Third, to address the lack of labelled data,
we propose self-supervised pretraining on unlabelled data. We curate and
release the largest pose-based pretraining dataset on Indian Sign Language
(Indian-SL). Fourth, we compare different pretraining strategies and for the
first time establish that pretraining is effective for sign language
recognition by demonstrating (a) improved fine-tuning performance especially in
low-resource settings, and (b) high crosslingual transfer from Indian-SL to few
other sign languages. We open-source all models and datasets in OpenHands with
a hope that it makes research in sign languages more accessible, available here
at https://github.com/AI4Bharat/OpenHands .
Related papers
- iSign: A Benchmark for Indian Sign Language Processing [5.967764101493575]
iSign is a benchmark for Indian Sign Language (ISL) processing.
We release one of the largest ISL-English datasets with more than 118K video-sentence/phrase pairs.
We provide insights into the proposed benchmarks with a few linguistic insights into the workings of ISL.
arXiv Detail & Related papers (2024-07-07T15:07:35Z) - SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language.
Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer.
Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z) - Visual Speech Recognition for Languages with Limited Labeled Data using
Automatic Labels from Whisper [96.43501666278316]
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages.
We employ a Whisper model which can conduct both language identification and audio-based speech recognition.
By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels.
arXiv Detail & Related papers (2023-09-15T16:53:01Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - Learning from What is Already Out There: Few-shot Sign Language
Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos.
We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Including Signed Languages in Natural Language Processing [48.62744923724317]
Signed languages are the primary means of communication for many deaf and hard of hearing individuals.
This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact.
arXiv Detail & Related papers (2021-05-11T17:37:55Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - BanglaBERT: Combating Embedding Barrier for Low-Resource Language
Understanding [1.7000879291900044]
We build a Bangla natural language understanding model pre-trained on 18.6 GB data we crawled from top Bangla sites on the internet.
Our model outperforms multilingual baselines and previous state-of-the-art results by 1-6%.
We identify a major shortcoming of multilingual models that hurt performance for low-resource languages that don't share writing scripts with any high resource one.
arXiv Detail & Related papers (2021-01-01T09:28:45Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.