Real-Time Sign Language Gestures to Speech Transcription using Deep Learning
- URL: http://arxiv.org/abs/2508.12713v1
- Date: Mon, 18 Aug 2025 08:25:18 GMT
- Title: Real-Time Sign Language Gestures to Speech Transcription using Deep Learning
- Authors: Brandone Fonya,
- Abstract summary: This project introduces a real-time assistive technology solution that leverages advanced deep learning techniques to translate sign language gestures into textual and audible speech.<n>By employing convolution neural networks (CNN) trained on the Sign Language MNIST dataset, the system accurately classifies hand gestures captured live via webcam.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Communication barriers pose significant challenges for individuals with hearing and speech impairments, often limiting their ability to effectively interact in everyday environments. This project introduces a real-time assistive technology solution that leverages advanced deep learning techniques to translate sign language gestures into textual and audible speech. By employing convolution neural networks (CNN) trained on the Sign Language MNIST dataset, the system accurately classifies hand gestures captured live via webcam. Detected gestures are instantaneously translated into their corresponding meanings and transcribed into spoken language using text-to-speech synthesis, thus facilitating seamless communication. Comprehensive experiments demonstrate high model accuracy and robust real-time performance with some latency, highlighting the system's practical applicability as an accessible, reliable, and user-friendly tool for enhancing the autonomy and integration of sign language users in diverse social settings.
Related papers
- Indian Sign Language Detection for Real-Time Translation using Machine Learning [0.1747623282473278]
We propose a robust, real-time ISL detection & translation system built upon a Convolutional Neural Network (CNN)<n>Our model is trained on a comprehensive ISL dataset & demonstrates exceptional performance, achieving a classification accuracy of 99.95%.<n>For real-time implementation, the framework integrates MediaPipe for precise hand tracking & motion detection, enabling seamless translation of dynamic gestures.
arXiv Detail & Related papers (2025-07-27T21:15:46Z) - ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models [70.56468982313834]
We propose ProsodyLM, which introduces a simple tokenization scheme amenable to learning prosody.<n>We find that ProsodyLM can learn surprisingly diverse emerging prosody processing capabilities through pre-training alone.
arXiv Detail & Related papers (2025-07-27T00:59:01Z) - Understanding Co-speech Gestures in-the-wild [52.5993021523165]
We introduce a new framework for co-speech gesture understanding in the wild.<n>We propose three new tasks and benchmarks to evaluate a model's capability to comprehend gesture-speech-text associations.<n>We present a new approach that learns a tri-modal video-gesture-speech-text representation to solve these tasks.
arXiv Detail & Related papers (2025-03-28T17:55:52Z) - Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders [10.664605070306417]
We propose a gesture-aware Automatic Speech Recognition (ASR) system with zero-shot learning for individuals with speech impairments.<n>Experiment results and analyses show that including gesture information significantly enhances semantic understanding.
arXiv Detail & Related papers (2025-02-18T14:15:55Z) - OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis [73.03333371375]
name is a two-stage training framework that integrates omnimodal alignment and speech generation.<n>It surpasses state-of-the-art models across omnimodal, vision-language, and speech-language benchmarks.<n>name achieves real-time speech generation with 1s latency at non-autoregressive mode.
arXiv Detail & Related papers (2025-01-08T15:18:09Z) - Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability [0.0]
We suggest a novel solution that uses a deep neural network to fully automate sign language recognition.
This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance.
Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method.
arXiv Detail & Related papers (2024-09-11T17:17:44Z) - EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - A Transformer-Based Multi-Stream Approach for Isolated Iranian Sign Language Recognition [0.0]
This research aims to recognize Iranian Sign Language words with the help of the latest deep learning tools such as transformers.
The dataset used includes 101 Iranian Sign Language words frequently used in academic environments such as universities.
arXiv Detail & Related papers (2024-06-27T06:54:25Z) - TRAVID: An End-to-End Video Translation Framework [1.6131714685439382]
We present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker.
Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings.
arXiv Detail & Related papers (2023-09-20T14:13:05Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Learning Adaptive Language Interfaces through Decomposition [89.21937539950966]
We introduce a neural semantic parsing system that learns new high-level abstractions through decomposition.
Users interactively teach the system by breaking down high-level utterances describing novel behavior into low-level steps.
arXiv Detail & Related papers (2020-10-11T08:27:07Z) - Meta-Transfer Learning for Code-Switched Speech Recognition [72.84247387728999]
We propose a new learning method, meta-transfer learning, to transfer learn on a code-switched speech recognition system in a low-resource setting.
Our model learns to recognize individual languages, and transfer them so as to better recognize mixed-language speech by conditioning the optimization on the code-switching data.
arXiv Detail & Related papers (2020-04-29T14:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.