MHB: Multimodal Handshape-aware Boundary Detection for Continuous Sign Language Recognition
- URL: http://arxiv.org/abs/2511.19907v1
- Date: Tue, 25 Nov 2025 04:31:12 GMT
- Title: MHB: Multimodal Handshape-aware Boundary Detection for Continuous Sign Language Recognition
- Authors: Mingyu Zhao, Zhanfu Yang, Yang Zhou, Zhaoyang Xia, Can Jin, Xiaoxiao He, Carol Neidle, Dimitris N. Metaxas,
- Abstract summary: We use machine learning to detect the start and end frames of signs in videos of American Sign Language (ASL) sentences.<n>For improved robustness, we use 3D skeletal features extracted from sign language videos to capture the convergence of sign properties.<n>A multimodal fusion module is then used to unify the pretrained sign video segmentation framework and the handshape classification models.
- Score: 29.45413576236808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a multimodal approach for continuous sign recognition that first uses machine learning to detect the start and end frames of signs in videos of American Sign Language (ASL) sentences, and then recognizes the segmented signs. For improved robustness, we use 3D skeletal features extracted from sign language videos to capture the convergence of sign properties and their dynamics, which tend to cluster at sign boundaries. Another focus of this work is the incorporation of information from 3D handshape for boundary detection. To detect handshapes normally expected at the beginning and end of signs, we pretrain a handshape classifier for 87 linguistically defined canonical handshape categories using a dataset that we created by integrating and normalizing several existing datasets. A multimodal fusion module is then used to unify the pretrained sign video segmentation framework and the handshape classification models. Finally, the estimated boundaries are used for sign recognition, where the recognition model is trained on a large database containing both citation-form isolated signs and signs pre-segmented (based on manual annotations) from continuous signing, as such signs often differ in certain respects. We evaluate our method on the ASLLRP corpus and demonstrate significant improvements over previous work.
Related papers
- Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture [0.4083182125683813]
We present a first step toward sign language retrieval by addressing the challenge of detecting the presence or absence of a query sign video.<n>Unlike conventional approaches that rely on intermediate gloss recognition or text-based matching, we propose an end-to-end model that directly operates on pose keypoints extracted from sign videos.<n>Our architecture employs an encoder-only backbone with a binary classification head to determine whether the query sign appears within the target sequence.
arXiv Detail & Related papers (2025-12-09T15:49:23Z) - Hands-On: Segmenting Individual Signs from Continuous Sequences [28.01996053847279]
We propose a transformer-based architecture that models the temporal dynamics of signing and frames segmentation.<n>Our model achieves state-of-the-art results on the DGS Corpus, while our features surpass prior benchmarks on BSLCorpus.
arXiv Detail & Related papers (2025-04-11T14:52:59Z) - SignRep: Enhancing Self-Supervised Sign Representations [30.008980708977095]
Sign language representation learning presents unique challenges due to the complex-temporal nature of signs and the scarcity of labeled datasets.<n>We introduce a scalable, self-supervised framework for sign representation learning.<n>Our model does not require skeletal keypoints during inference, avoiding the limitations of key-point-based models during downstream tasks.<n>It excels in sign dictionary retrieval and sign translation, surpassing standard MAE pre-training and skeletal-based representations in retrieval.
arXiv Detail & Related papers (2025-03-11T15:20:01Z) - Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs.<n>We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z) - MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users.
A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step.
Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z) - A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition.
The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched.
The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z) - Linguistically Motivated Sign Language Segmentation [51.06873383204105]
We consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases.
Our method is motivated by linguistic cues observed in sign language corpora.
We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing.
arXiv Detail & Related papers (2023-10-21T10:09:34Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Word separation in continuous sign language using isolated signs and
post-processing [47.436298331905775]
We propose a two-stage model for Continuous Sign Language Recognition.
In the first stage, the predictor model, which includes a combination of CNN, SVD, and LSTM, is trained with the isolated signs.
In the second stage, we apply a post-processing algorithm to the Softmax outputs obtained from the first part of the model.
arXiv Detail & Related papers (2022-04-02T18:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.