Related papers: Scaling up Multimodal Pre-training for Sign Language Understanding

Scaling up Multimodal Pre-training for Sign Language Understanding

URL: http://arxiv.org/abs/2408.08544v1
Date: Fri, 16 Aug 2024 06:04:25 GMT
Title: Scaling up Multimodal Pre-training for Sign Language Understanding
Authors: Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li,
Abstract summary: Sign language serves as the primary meaning of communication for the deaf-mute community. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied. These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos.
Score: 96.17753464544604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sign language serves as the primary meaning of communication for the deaf-mute community. Different from spoken language, it commonly conveys information by the collaboration of manual features, i.e., hand gestures and body movements, and non-manual features, i.e., facial expressions and mouth cues. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT). Sign language recognition and translation aims to understand the semantic meaning conveyed by sign languages from gloss-level and sentence-level, respectively. In contrast, SL-RT focuses on retrieving sign videos or corresponding texts from a closed-set under the query-by-example search paradigm. These tasks investigate sign language topics from diverse perspectives and raise challenges in learning effective representation of sign language videos. To advance the development of sign language understanding, exploring a generalized model that is applicable across various SLU tasks is a profound research direction.

Related papers

Teach Me Sign: Stepwise Prompting LLM for Sign Language Production [4.855031479710184]
We propose TEAch Me Sign (TEAM-Sign), treating sign language as another natural language.<n>By fine-tuning an LLM, we enable it to learn the correspondence between text and sign language.<n>Considering the differences between sign and spoken language, we employ a stepwise prompting strategy to extract the inherent sign language knowledge.
arXiv Detail & Related papers (2025-07-15T04:31:52Z)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
SHuBERT (Sign Hidden-Unit BERT) is a self-supervised contextual representation model learned from 1,000 hours of American Sign Language video.<n>SHuBERT adapts masked token prediction objectives to multi-stream visual sign language input, learning to predict multiple targets corresponding to clustered hand, face, and body pose streams.<n>SHuBERT achieves state-of-the-art performance across multiple tasks including sign language translation, isolated sign language recognition, and fingerspelling detection.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
Continuous Sign Language Recognition System using Deep Learning with MediaPipe Holistic [1.9874264019909988]
Sign languages are the language of hearing-impaired people who use visuals for communication. Approximately 300 sign languages are being practiced worldwide such as American Sign Language (ASL), Chinese Sign Language (CSL), Indian Sign Language (ISL)
arXiv Detail & Related papers (2024-11-07T08:19:39Z)
SCOPE: Sign Language Contextual Processing with Embedding from LLMs [49.5629738637893]
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition ( SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. We introduce SCOPE, a novel context-aware vision-based SLR and SLT framework.
arXiv Detail & Related papers (2024-09-02T08:56:12Z)
EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks. We propose efficient transformer-based framework for event-based SLR and SLT tasks. Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z)
Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production [9.065171626657818]
Universal Gloss-level Representation (UniGloR) is a unified and self-supervised solution for both Sign Language Translation and Sign Language Production. Our results demonstrate UniGloR's effectiveness in the translation and production tasks. Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications.
arXiv Detail & Related papers (2024-07-03T07:12:36Z)
SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation [3.9711029428461653]
We introduce a new task named multi-channel sign language translation (MCSLT) We present a novel metric, SignBLEU, designed to capture multiple signal channels. We found that SignBLEU consistently correlates better with human judgment than competing metrics.
arXiv Detail & Related papers (2024-06-10T05:01:26Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition. We first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z)
Learnt Contrastive Concept Embeddings for Sign Recognition [33.72708697077754]
We focus on explicitly creating sign embeddings that bridge the gap between sign language and spoken language. We train a vocabulary of embeddings that are based on the linguistic labels for sign video. We develop a conceptual similarity loss which is able to utilise word embeddings from NLP methods to create sign embeddings that have better sign language to spoken language correspondence.
arXiv Detail & Related papers (2023-08-18T12:47:18Z)
All You Need In Sign Language Production [50.3955314892191]
Sign language recognition and production need to cope with some critical challenges. We present an introduction to the Deaf culture, Deaf centers, psychological perspective of sign language. Also, the backbone architectures and methods in SLP are briefly introduced and the proposed taxonomy on SLP is presented.
arXiv Detail & Related papers (2022-01-05T13:45:09Z)
Sign Language Production: A Review [51.07720650677784]
Sign Language is the dominant yet non-primary form of communication language used in the deaf and hearing-impaired community. To make an easy and mutual communication between the hearing-impaired and the hearing communities, building a robust system capable of translating the spoken language into sign language is fundamental. To this end, sign language recognition and production are two necessary parts for making such a two-way system.
arXiv Detail & Related papers (2021-03-29T19:38:22Z)
Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate. Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.