Global-local Enhancement Network for NMFs-aware Sign Language
Recognition
- URL: http://arxiv.org/abs/2008.10428v2
- Date: Mon, 16 Aug 2021 03:38:16 GMT
- Title: Global-local Enhancement Network for NMFs-aware Sign Language
Recognition
- Authors: Hezhen Hu, Wengang Zhou, Junfu Pu, Houqiang Li
- Abstract summary: We propose a simple yet effective architecture called Global-local Enhancement Network (GLE-Net)
Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues.
We introduce the first non-manual-features-aware isolated Chinese sign language dataset with a total vocabulary size of 1,067 sign words in daily life.
- Score: 135.30357113518127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign language recognition (SLR) is a challenging problem, involving complex
manual features, i.e., hand gestures, and fine-grained non-manual features
(NMFs), i.e., facial expression, mouth shapes, etc. Although manual features
are dominant, non-manual features also play an important role in the expression
of a sign word. Specifically, many sign words convey different meanings due to
non-manual features, even though they share the same hand gestures. This
ambiguity introduces great challenges in the recognition of sign words. To
tackle the above issue, we propose a simple yet effective architecture called
Global-local Enhancement Network (GLE-Net), including two mutually promoted
streams towards different crucial aspects of SLR. Of the two streams, one
captures the global contextual relationship, while the other stream captures
the discriminative fine-grained cues. Moreover, due to the lack of datasets
explicitly focusing on this kind of features, we introduce the first
non-manual-features-aware isolated Chinese sign language dataset~(NMFs-CSL)
with a total vocabulary size of 1,067 sign words in daily life. Extensive
experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of
our method.
Related papers
- Scaling up Multimodal Pre-training for Sign Language Understanding [96.17753464544604]
Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
arXiv Detail & Related papers (2024-08-16T06:04:25Z) - SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation [3.9711029428461653]
We introduce a new task named multi-channel sign language translation (MCSLT)
We present a novel metric, SignBLEU, designed to capture multiple signal channels.
We found that SignBLEU consistently correlates better with human judgment than competing metrics.
arXiv Detail & Related papers (2024-06-10T05:01:26Z) - MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information.
Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Natural Language-Assisted Sign Language Recognition [28.64871971445024]
We propose the Natural Language-Assisted Sign Language Recognition framework.
It exploits semantic information contained in glosses (sign labels) to mitigate the problem of visually indistinguishable signs (VISigns) in sign languages.
Our method achieves state-of-the-art performance on three widely-adopted benchmarks: MSASL, WLASL, and NMFs-CSL.
arXiv Detail & Related papers (2023-03-21T17:59:57Z) - Self-Sufficient Framework for Continuous Sign Language Recognition [75.60327502570242]
The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition.
These include the need for complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations.
We propose Divide and Focus Convolution (DFConv) which extracts both manual and non-manual features without the need for additional networks or annotations.
DPLR propagates non-spiky frame-level pseudo-labels by combining the ground truth gloss sequence labels with the predicted sequence.
arXiv Detail & Related papers (2023-03-21T11:42:57Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Temporal Accumulative Features for Sign Language Recognition [2.3204178451683264]
We have devised an efficient and fast SLR method for recognizing isolated sign language gestures.
We also incorporate hand shape information and using a small scale sequential neural network, demonstrate that modeling of accumulative features for linguistic subunits improves upon baseline classification results.
arXiv Detail & Related papers (2020-04-02T19:03:40Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.