Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models
- URL: http://arxiv.org/abs/2410.00681v1
- Date: Tue, 1 Oct 2024 13:39:26 GMT
- Title: Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models
- Authors: Mazen Balat, Rewaa Awaad, Hend Adel, Ahmed B. Zaky, Salah A. Aly,
- Abstract summary: This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models.
We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL.
Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6% and 99.43% on ArSL2018 and AASL, respectively.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models. We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL. This task will make full use of state-of-the-art CNN architectures like ResNet50, MobileNetV2, and EfficientNetB7, and the latest transformer models such as Google ViT and Microsoft Swin Transformer. These pre-trained models have been fine-tuned on the above datasets in an attempt to capture some unique features of Arabic sign language motions. Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6\% and 99.43\% on ArSL2018 and AASL, respectively. That is far beyond the previously reported state-of-the-art approaches. This performance opens up even more avenues for communication that may be more accessible to Arabic-speaking deaf and hard-of-hearing, and thus encourages an inclusive society.
Related papers
- Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for
Classifying Arabic Speech Acts on Twitter [0.32885740436059047]
This paper proposes a Twitter dialectal Arabic speech act classification approach based on a transformer deep learning neural network.
We proposed a BERT based weighted ensemble learning approach to integrate the advantages of various BERT models in dialectal Arabic speech acts classification.
The results show that the best BERT model is araBERTv2-Twitter models with a macro-averaged F1 score and an accuracy of 0.73 and 0.84, respectively.
arXiv Detail & Related papers (2024-01-30T19:01:24Z) - Design of Arabic Sign Language Recognition Model [0.0]
The model is tested on ArASL 2018, consisting of 54,000 images for 32 alphabet signs gathered from 40 signers.
The future work will be a model that converts Arabic sign language into Arabic text.
arXiv Detail & Related papers (2023-01-06T19:19:25Z) - Robotic Skill Acquisition via Instruction Augmentation with
Vision-Language Models [70.82705830137708]
We introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL)
We utilize semi-language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data.
DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset.
arXiv Detail & Related papers (2022-11-21T18:56:00Z) - Sign Language to Text Conversion in Real Time using Transfer Learning [0.0]
We propose a deep learning model trained on the American Sign Language.
There has been an improvement in accuracy from 94% of CNN to 98.7% by Transfer Learning.
arXiv Detail & Related papers (2022-11-13T17:20:19Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Align before Fuse: Vision and Language Representation Learning with
Momentum Distillation [52.40490994871753]
We introduce a contrastive loss to representations BEfore Fusing (ALBEF) through cross-modal attention.
We propose momentum distillation, a self-training method which learns from pseudo-targets produced by a momentum model.
ALBEF achieves state-of-the-art performance on multiple downstream vision-language tasks.
arXiv Detail & Related papers (2021-07-16T00:19:22Z) - VidLanKD: Improving Language Understanding via Video-Distilled Knowledge
Transfer [76.3906723777229]
We present VidLanKD, a video-language knowledge distillation method for improving language understanding.
We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.
In our experiments, VidLanKD achieves consistent improvements over text-only language models and vokenization models.
arXiv Detail & Related papers (2021-07-06T15:41:32Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - AraGPT2: Pre-Trained Transformer for Arabic Language Generation [0.0]
We develop the first advanced Arabic language generation model, AraGPT2, trained from scratch on a large Arabic corpus of internet text and news articles.
Our largest model, AraGPT2-mega, has 1.46 billion parameters, which makes it the largest Arabic language model available.
For text generation, our best model achieves a perplexity of 29.8 on held-out Wikipedia articles.
arXiv Detail & Related papers (2020-12-31T09:48:05Z) - AraELECTRA: Pre-Training Text Discriminators for Arabic Language
Understanding [0.0]
We develop an Arabic language representation model, which we name AraELECTRA.
Our model is pretrained using the replaced token detection objective on large Arabic text corpora.
We show that AraELECTRA outperforms current state-of-the-art Arabic language representation models, given the same pretraining data and with even a smaller model size.
arXiv Detail & Related papers (2020-12-31T09:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.