Natural Language-Assisted Sign Language Recognition
- URL: http://arxiv.org/abs/2303.12080v1
- Date: Tue, 21 Mar 2023 17:59:57 GMT
- Title: Natural Language-Assisted Sign Language Recognition
- Authors: Ronglai Zuo, Fangyun Wei, Brian Mak
- Abstract summary: We propose the Natural Language-Assisted Sign Language Recognition framework.
It exploits semantic information contained in glosses (sign labels) to mitigate the problem of visually indistinguishable signs (VISigns) in sign languages.
Our method achieves state-of-the-art performance on three widely-adopted benchmarks: MSASL, WLASL, and NMFs-CSL.
- Score: 28.64871971445024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign languages are visual languages which convey information by signers'
handshape, facial expression, body movement, and so forth. Due to the inherent
restriction of combinations of these visual ingredients, there exist a
significant number of visually indistinguishable signs (VISigns) in sign
languages, which limits the recognition capacity of vision neural networks. To
mitigate the problem, we propose the Natural Language-Assisted Sign Language
Recognition (NLA-SLR) framework, which exploits semantic information contained
in glosses (sign labels). First, for VISigns with similar semantic meanings, we
propose language-aware label smoothing by generating soft labels for each
training sign whose smoothing weights are computed from the normalized semantic
similarities among the glosses to ease training. Second, for VISigns with
distinct semantic meanings, we present an inter-modality mixup technique which
blends vision and gloss features to further maximize the separability of
different signs under the supervision of blended labels. Besides, we also
introduce a novel backbone, video-keypoint network, which not only models both
RGB videos and human body keypoints but also derives knowledge from sign videos
of different temporal receptive fields. Empirically, our method achieves
state-of-the-art performance on three widely-adopted benchmarks: MSASL, WLASL,
and NMFs-CSL. Codes are available at https://github.com/FangyunWei/SLRT.
Related papers
- Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation [2.6311088262657907]
This work proposes an Isolated Sign Language Recognition (ISLR) approach where body, hands, and facial landmarks are extracted throughout time and encoded as 2-D images.
We show that our method surpassed the state-of-the-art in terms of performance metrics on two widely recognized datasets in Brazilian Sign Language (LIBRAS)
In addition to being more accurate, our method is more time-efficient and easier to train due to its reliance on a simpler network architecture and solely RGB data as input.
arXiv Detail & Related papers (2024-04-29T23:21:17Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Learnt Contrastive Concept Embeddings for Sign Recognition [33.72708697077754]
We focus on explicitly creating sign embeddings that bridge the gap between sign language and spoken language.
We train a vocabulary of embeddings that are based on the linguistic labels for sign video.
We develop a conceptual similarity loss which is able to utilise word embeddings from NLP methods to create sign embeddings that have better sign language to spoken language correspondence.
arXiv Detail & Related papers (2023-08-18T12:47:18Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Revisiting Multimodal Representation in Contrastive Learning: From Patch
and Token Embeddings to Finite Discrete Tokens [76.40196364163663]
We propose a learning-based vision-language pre-training approach, such as CLIP.
We show that our method can learn more comprehensive representations and capture meaningful cross-modal correspondence.
arXiv Detail & Related papers (2023-03-27T00:58:39Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Global-local Enhancement Network for NMFs-aware Sign Language
Recognition [135.30357113518127]
We propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net)
Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues.
We introduce the first non-manual-features-aware isolated Chinese sign language dataset with a total vocabulary size of 1,067 sign words in daily life.
arXiv Detail & Related papers (2020-08-24T13:28:55Z) - Temporal Accumulative Features for Sign Language Recognition [2.3204178451683264]
We have devised an efficient and fast SLR method for recognizing isolated sign language gestures.
We also incorporate hand shape information and using a small scale sequential neural network, demonstrate that modeling of accumulative features for linguistic subunits improves upon baseline classification results.
arXiv Detail & Related papers (2020-04-02T19:03:40Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.