SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding
- URL: http://arxiv.org/abs/2305.04868v1
- Date: Mon, 8 May 2023 17:16:38 GMT
- Title: SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding
- Authors: Hezhen Hu, Weichao Zhao, Wengang Zhou, Houqiang Li
- Abstract summary: Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
- Score: 132.78015553111234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are
prone to over-fitting due to insufficient sign data resource and suffer limited
interpretability. In this paper, we propose the first self-supervised
pre-trainable SignBERT+ framework with model-aware hand prior incorporated. In
our framework, the hand pose is regarded as a visual token, which is derived
from an off-the-shelf detector. Each visual token is embedded with gesture
state and spatial-temporal position encoding. To take full advantage of current
sign data resource, we first perform self-supervised learning to model its
statistics. To this end, we design multi-level masked modeling strategies
(joint, frame and clip) to mimic common failure detection cases. Jointly with
these masked modeling strategies, we incorporate model-aware hand prior to
better capture hierarchical context over the sequence. After the pre-training,
we carefully design simple yet effective prediction heads for downstream tasks.
To validate the effectiveness of our framework, we perform extensive
experiments on three main SLU tasks, involving isolated and continuous sign
language recognition (SLR), and sign language translation (SLT). Experimental
results demonstrate the effectiveness of our method, achieving new
state-of-the-art performance with a notable gain.
Related papers
- Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language.
Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer.
Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z) - Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining [0.6144680854063939]
State-of-the-art Conformer model for Speech Recognition is adapted for continuous sign language recognition.
This marks the first instance of employing Conformer for a vision-based task.
Unsupervised pretraining is conducted on a curated sign language dataset.
arXiv Detail & Related papers (2024-05-20T13:40:52Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Towards the extraction of robust sign embeddings for low resource sign
language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z) - BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization [135.73436686653315]
We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model.
Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone.
Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence.
It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
arXiv Detail & Related papers (2023-02-10T06:23:44Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
Language Recognition [94.30084702921529]
Hand gesture serves as a critical role in sign language.
Current deep-learning-based sign language recognition methods may suffer insufficient interpretability.
We introduce the first self-supervised pre-trainable SignBERT with incorporated hand prior for SLR.
arXiv Detail & Related papers (2021-10-11T16:18:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.