BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization
- URL: http://arxiv.org/abs/2302.05075v3
- Date: Tue, 28 Mar 2023 02:02:04 GMT
- Title: BEST: BERT Pre-Training for Sign Language Recognition with Coupling
Tokenization
- Authors: Weichao Zhao, Hezhen Hu, Wengang Zhou, Jiaxin Shi, Houqiang Li
- Abstract summary: We are dedicated to leveraging the BERT pre-training success and modeling the domain-specific statistics to fertilize the sign language recognition( SLR) model.
Considering the dominance of hand and body in sign language expression, we organize them as pose triplet units and feed them into the Transformer backbone.
Pre-training is performed via reconstructing the masked triplet unit from the corrupted input sequence.
It adaptively extracts the discrete pseudo label from the pose triplet unit, which represents the semantic gesture/body state.
- Score: 135.73436686653315
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this work, we are dedicated to leveraging the BERT pre-training success
and modeling the domain-specific statistics to fertilize the sign language
recognition~(SLR) model. Considering the dominance of hand and body in sign
language expression, we organize them as pose triplet units and feed them into
the Transformer backbone in a frame-wise manner. Pre-training is performed via
reconstructing the masked triplet unit from the corrupted input sequence, which
learns the hierarchical correlation context cues among internal and external
triplet units. Notably, different from the highly semantic word token in BERT,
the pose unit is a low-level signal originally located in continuous space,
which prevents the direct adoption of the BERT cross-entropy objective. To this
end, we bridge this semantic gap via coupling tokenization of the triplet unit.
It adaptively extracts the discrete pseudo label from the pose triplet unit,
which represents the semantic gesture/body state. After pre-training, we
fine-tune the pre-trained encoder on the downstream SLR task, jointly with the
newly added task-specific layer. Extensive experiments are conducted to
validate the effectiveness of our proposed method, achieving new
state-of-the-art performance on all four benchmarks with a notable gain.
Related papers
- LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition [56.22672276092373]
Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition.
We propose a unified framework termed hierarchicaL dEcoupling And Fusing to coordinate expression-relevant representations and pseudo-labels.
We show that LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data.
arXiv Detail & Related papers (2024-04-23T13:43:33Z) - Improving Self-training for Cross-lingual Named Entity Recognition with
Contrastive and Prototype Learning [80.08139343603956]
In cross-lingual named entity recognition, self-training is commonly used to bridge the linguistic gap.
In this work, we aim to improve self-training for cross-lingual NER by combining representation learning and pseudo label refinement.
Our proposed method, namely ContProto mainly comprises two components: (1) contrastive self-training and (2) prototype-based pseudo-labeling.
arXiv Detail & Related papers (2023-05-23T02:52:16Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - PA-Seg: Learning from Point Annotations for 3D Medical Image
Segmentation using Contextual Regularization and Cross Knowledge Distillation [14.412073730567137]
We propose to annotate a segmentation target with only seven points in 3D medical images, and design a two-stage weakly supervised learning framework PA-Seg.
In the first stage, we employ geodesic distance transform to expand the seed points to provide more supervision signal.
In the second stage, we use predictions obtained by the model pre-trained in the first stage as pseudo labels.
arXiv Detail & Related papers (2022-08-11T07:00:33Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Coarse-to-Fine Pre-training for Named Entity Recognition [26.00489191164784]
We propose a NER-specific pre-training framework to in-ject coarse-to-fine automatically mined entityknowledge into pre-trained models.
Our framework achieves significant improvements against several pre-trained base-lines, establishing the new state-of-the-art per-formance on three benchmarks.
arXiv Detail & Related papers (2020-10-16T07:39:20Z) - Translate Reverberated Speech to Anechoic Ones: Speech Dereverberation
with BERT [6.876734825043823]
Single channel speech dereverberation is considered in this work.
Inspired by the recent success of Bidirectional Representations from Transformers (BERT) model in the domain of Natural Language Processing (NLP), we investigate its applicability as backbone sequence model to enhance reverberated speech signal.
arXiv Detail & Related papers (2020-07-16T00:45:27Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.