Scaling up Multimodal Pre-training for Sign Language Understanding
- URL: http://arxiv.org/abs/2408.08544v1
- Date: Fri, 16 Aug 2024 06:04:25 GMT
- Title: Scaling up Multimodal Pre-training for Sign Language Understanding
- Authors: Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li,
- Abstract summary: Sign language serves as the primary meaning of communication for the deaf-mute community.
To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied.
These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
- Score: 96.17753464544604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sign language serves as the primary meaning of communication for the deaf-mute community. Different from spoken language, it commonly conveys information by the collaboration of manual features, i.e., hand gestures and body movements, and non-manual features, i.e., facial expressions and mouth cues. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT). Sign language recognition and translation aims to understand the semantic meaning conveyed by sign languages from gloss-level and sentence-level, respectively. In contrast, SL-RT focuses on retrieving sign videos or corresponding texts from a closed-set under the query-by-example search paradigm. These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos. To advance the development of sign language understanding, exploring a generalized model that is applicable across various SLU tasks is a profound research direction.
Related papers
- Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic [1.9874264019909988]
Sign languages are the language of hearing-impaired people who use visuals for communication.
Approximately 300 sign languages are being practiced worldwide such as American Sign Language (ASL), Chinese Sign Language (CSL), Indian Sign Language (ISL)
arXiv Detail & Related papers (2024-11-07T08:19:39Z) - SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information.
Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information.
We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z) - EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production [9.065171626657818]
Universal Gloss-level Representation (UniGloR) is a unified and self-supervised solution for both Sign Language Translation and Sign Language Production.
Our results demonstrate UniGloR's effectiveness in the translation and production tasks.
Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications.
arXiv Detail & Related papers (2024-07-03T07:12:36Z) - SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation [3.9711029428461653]
We introduce a new task named multi-channel sign language translation (MCSLT)
We present a novel metric, SignBLEU, designed to capture multiple signal channels.
We found that SignBLEU consistently correlates better with human judgment than competing metrics.
arXiv Detail & Related papers (2024-06-10T05:01:26Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - All You Need In Sign Language Production [50.3955314892191]
Sign language recognition and production need to cope with some critical challenges.
We present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language.
Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented.
arXiv Detail & Related papers (2022-01-05T13:45:09Z) - Sign Language Production: A Review [51.07720650677784]
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community.
To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language is fundamental.
To this end, sign language recognition and production are two necessary parts for making such a two-way system.
arXiv Detail & Related papers (2021-03-29T19:38:22Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.