An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface
- URL: http://arxiv.org/abs/2408.09311v1
- Date: Sat, 17 Aug 2024 23:59:17 GMT
- Title: An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface
- Authors: Kevin Jose Thomas,
- Abstract summary: This paper introduces an open-source interface for American Sign Language fingerspell recognition and semantic pose retrieval.
We discuss the technical details of the model architecture, application in the wild, as well as potential future enhancements for real-world consumer applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces an open-source interface for American Sign Language fingerspell recognition and semantic pose retrieval, aimed to serve as a stepping stone towards more advanced sign language translation systems. Utilizing a combination of convolutional neural networks and pose estimation models, the interface provides two modular components: a recognition module for translating ASL fingerspelling into spoken English and a production module for converting spoken English into ASL pose sequences. The system is designed to be highly accessible, user-friendly, and capable of functioning in real-time under varying environmental conditions like backgrounds, lighting, skin tones, and hand sizes. We discuss the technical details of the model architecture, application in the wild, as well as potential future enhancements for real-world consumer applications.
Related papers
- Isolated Sign Language Recognition with Segmentation and Pose Estimation [0.0]
Isolated sign language recognition can bridge the gap between large language models and American Sign Language (ASL) users.<n>We propose a model for ISLR that reduces computational requirements while maintaining robustness to signer variation.
arXiv Detail & Related papers (2025-12-16T19:44:12Z) - Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment [84.39962912136525]
We develop a model for sign language understanding that performs sign language translation (SLT) and sign-subtitle alignment (SSA)<n>Our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA.
arXiv Detail & Related papers (2025-12-08T21:05:46Z) - Large Sign Language Models: Toward 3D American Sign Language Translation [33.777693392753385]
We present Large Sign Language Models (LSLM), a novel framework for translating 3D American Sign Language (ASL)<n>Unlike existing sign language recognition methods that rely on 2D video, our approach directly utilizes 3D sign language data to capture rich spatial, gestural, and depth information in 3D scenes.<n>This enables more accurate and resilient translation, enhancing digital communication accessibility for the hearing-impaired community.
arXiv Detail & Related papers (2025-11-11T18:16:43Z) - Reactive Semantics for User Interface Description Languages [0.0]
We propose a denotational semantic model for a core reactive UIDL, Smalite.<n>This work may be used as a stepping stone to produce a formally verified compiler for UIDLs.
arXiv Detail & Related papers (2025-08-19T08:16:28Z) - Indian Sign Language Detection for Real-Time Translation using Machine Learning [0.1747623282473278]
We propose a robust, real-time ISL detection & translation system built upon a Convolutional Neural Network (CNN)<n>Our model is trained on a comprehensive ISL dataset & demonstrates exceptional performance, achieving a classification accuracy of 99.95%.<n>For real-time implementation, the framework integrates MediaPipe for precise hand tracking & motion detection, enabling seamless translation of dynamic gestures.
arXiv Detail & Related papers (2025-07-27T21:15:46Z) - Real-Time Multilingual Sign Language Processing [4.626189039960495]
Sign Language Processing (SLP) is an interdisciplinary field comprised of Natural Language Processing (NLP) and Computer Vision.
Traditional approaches have often been constrained by the use of gloss-based systems that are both language-specific and inadequate for capturing the multidimensional nature of sign language.
We propose the use of SignWiring, a universal sign language transcription notation system, to serve as an intermediary link between the visual-gestural modality of signed languages and text-based linguistic representations.
arXiv Detail & Related papers (2024-12-02T21:51:41Z) - SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
SHuBERT (Sign Hidden-Unit BERT) is a self-supervised contextual representation model learned from 1,000 hours of American Sign Language video.<n>SHuBERT adapts masked token prediction objectives to multi-stream visual sign language input, learning to predict multiple targets corresponding to clustered hand, face, and body pose streams.<n>SHuBERT achieves state-of-the-art performance across multiple tasks including sign language translation, isolated sign language recognition, and fingerspelling detection.
arXiv Detail & Related papers (2024-11-25T03:13:08Z) - Generating Signed Language Instructions in Large-Scale Dialogue Systems [25.585339304165466]
We introduce a goal-oriented conversational AI system enhanced with American Sign Language (ASL) instructions.
Our system receives input from users and seamlessly generates ASL instructions by leveraging retrieval methods and cognitively based gloss translations.
arXiv Detail & Related papers (2024-10-17T20:56:29Z) - Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability [0.0]
We suggest a novel solution that uses a deep neural network to fully automate sign language recognition.
This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance.
Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method.
arXiv Detail & Related papers (2024-09-11T17:17:44Z) - CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models [59.91221728187576]
This paper introduces the CMU Linguistic Linguistic Backend, an open-source framework that simplifies model deployment and continuous human-in-the-loop fine-tuning of NLP models.
CMULAB enables users to leverage the power of multilingual models to quickly adapt and extend existing tools for speech recognition, OCR, translation, and syntactic analysis to new languages.
arXiv Detail & Related papers (2024-04-03T02:21:46Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Signing at Scale: Learning to Co-Articulate Signs for Large-Scale
Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts.
Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences.
We tackle large-scale SLP by learning to co-articulate between dictionary signs.
We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z) - Language Model-Based Paired Variational Autoencoders for Robotic Language Learning [18.851256771007748]
Similar to human infants, artificial agents can learn language while interacting with their environment.
We present a neural model that bidirectionally binds robot actions and their language descriptions in a simple object manipulation scenario.
Next, we introduce PVAE-BERT, which equips the model with a pretrained large-scale language model.
arXiv Detail & Related papers (2022-01-17T10:05:26Z) - Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model.
A Transformer-based model along with a C3D model is used for hand detection and deep features extraction.
A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.