SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
Language Recognition
- URL: http://arxiv.org/abs/2110.05382v1
- Date: Mon, 11 Oct 2021 16:18:09 GMT
- Title: SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
Language Recognition
- Authors: Hezhen Hu, Weichao Zhao, Wengang Zhou, Yuechen Wang, Houqiang Li
- Abstract summary: Hand gesture serves as a critical role in sign language.
Current deep-learning-based sign language recognition methods may suffer insufficient interpretability.
We introduce the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR.
- Score: 94.30084702921529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hand gesture serves as a critical role in sign language. Current
deep-learning-based sign language recognition (SLR) methods may suffer
insufficient interpretability and overfitting due to limited sign data sources.
In this paper, we introduce the first self-supervised pre-trainable SignBERT
with incorporated hand prior for SLR. SignBERT views the hand pose as a visual
token, which is derived from an off-the-shelf pose extractor. The visual tokens
are then embedded with gesture state, temporal and hand chirality information.
To take full advantage of available sign data sources, SignBERT first performs
self-supervised pre-training by masking and reconstructing visual tokens.
Jointly with several mask modeling strategies, we attempt to incorporate hand
prior in a model-aware method to better model hierarchical context over the
hand sequence. Then with the prediction head added, SignBERT is fine-tuned to
perform the downstream SLR task. To validate the effectiveness of our method on
SLR, we perform extensive experiments on four public benchmark datasets, i.e.,
NMFs-CSL, SLR500, MSASL and WLASL. Experiment results demonstrate the
effectiveness of both self-supervised learning and imported hand prior.
Furthermore, we achieve state-of-the-art performance on all benchmarks with a
notable gain.
Related papers
- SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
We introduce SHuBERT, a self-supervised transformer encoder that learns strong representations from American Sign Language (ASL) video content.
Inspired by the success of the HuBERT speech representation model, SHuBERT adapts masked prediction for multi-stream visual sign language input.
SHuBERT achieves state-of-the-art performance across multiple benchmarks.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Towards the extraction of robust sign embeddings for low resource sign
language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.