Towards Online Sign Language Recognition and Translation
- URL: http://arxiv.org/abs/2401.05336v1
- Date: Wed, 10 Jan 2024 18:59:53 GMT
- Title: Towards Online Sign Language Recognition and Translation
- Authors: Ronglai Zuo, Fangyun Wei, Brian Mak
- Abstract summary: We develop a sign language dictionary encompassing all glosses present in a target sign language dataset.
We train an isolated sign language recognition model on augmented signs using both conventional classification loss and our novel saliency loss.
Our online recognition model can be extended to boost the performance of any offline model.
- Score: 41.85360877354916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of sign language recognition is to bridge the communication gap
between the deaf and the hearing. Numerous previous works train their models
using the well-established connectionist temporal classification (CTC) loss.
During the inference stage, the CTC-based models typically take the entire sign
video as input to make predictions. This type of inference scheme is referred
to as offline recognition. In contrast, while mature speech recognition systems
can efficiently recognize spoken words on the fly, sign language recognition
still falls short due to the lack of practical online solutions. In this work,
we take the first step towards filling this gap. Our approach comprises three
phases: 1) developing a sign language dictionary encompassing all glosses
present in a target sign language dataset; 2) training an isolated sign
language recognition model on augmented signs using both conventional
classification loss and our novel saliency loss; 3) employing a sliding window
approach on the input sign sequence and feeding each sign clip to the
well-optimized model for online recognition. Furthermore, our online
recognition model can be extended to boost the performance of any offline
model, and to support online translation by appending a gloss-to-text network
onto the recognition model. By integrating our online framework with the
previously best-performing offline model, TwoStream-SLR, we achieve new
state-of-the-art performance on three benchmarks: Phoenix-2014, Phoenix-2014T,
and CSL-Daily. Code and models will be available at
https://github.com/FangyunWei/SLRT
Related papers
- MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Natural Language-Assisted Sign Language Recognition [28.64871971445024]
We propose the Natural Language-Assisted Sign Language Recognition framework.
It exploits semantic information contained in glosses (sign labels) to mitigate the problem of visually indistinguishable signs (VISigns) in sign languages.
Our method achieves state-of-the-art performance on three widely-adopted benchmarks: MSASL, WLASL, and NMFs-CSL.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Fine-tuning of sign language recognition models: a technical report [0.0]
We focus on investigating two questions: how fine-tuning on datasets from other sign languages helps improve sign recognition quality, and whether sign recognition is possible in real-time without using GPU.
We provide code for reproducing model training experiments, converting models to ONNX format, and inference for real-time gesture recognition.
arXiv Detail & Related papers (2023-02-15T14:36:18Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.