Related papers: Real-Time Sign Language to text Translation using Deep Learning: A Comparative study of LSTM and 3D CNN

Real-Time Sign Language to text Translation using Deep Learning: A Comparative study of LSTM and 3D CNN

URL: http://arxiv.org/abs/2510.13137v1
Date: Wed, 15 Oct 2025 04:26:33 GMT
Title: Real-Time Sign Language to text Translation using Deep Learning: A Comparative study of LSTM and 3D CNN
Authors: Madhumati Pol, Anvay Anturkar, Anushka Khot, Ayush Andure, Aniruddha Ghosh, Anvit Magadum, Anvay Bahadur,
Abstract summary: This study investigates the performance of 3D Contemporalal Neural Networks (3D CNNs) and Long Short-Term Memory (LSTM) networks for real-time American Sign Language (ASL)<n> Experimental results demonstrate that 3D CNN achieve 92.4% recognition accuracy but require 3.2% more processing time per frame compared to LSTMs, which maintain 86.7% accuracy with significantly lower resource consumption.<n>This project provides professional benchmarks for developing assistive technologies, highlighting trade-offs between recognition precision and real-time operational requirements in edge computing environments.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates the performance of 3D Convolutional Neural Networks (3D CNNs) and Long Short-Term Memory (LSTM) networks for real-time American Sign Language (ASL) recognition. Though 3D CNNs are good at spatiotemporal feature extraction from video sequences, LSTMs are optimized for modeling temporal dependencies in sequential data. We evaluate both architectures on a dataset containing 1,200 ASL signs across 50 classes, comparing their accuracy, computational efficiency, and latency under similar training conditions. Experimental results demonstrate that 3D CNNs achieve 92.4% recognition accuracy but require 3.2% more processing time per frame compared to LSTMs, which maintain 86.7% accuracy with significantly lower resource consumption. The hybrid 3D CNNLSTM model shows decent performance, which suggests that context-dependent architecture selection is crucial for practical implementation.This project provides professional benchmarks for developing assistive technologies, highlighting trade-offs between recognition precision and real-time operational requirements in edge computing environments.

Related papers

Towards Sample Efficient Entanglement Classification for 3 and 4 Qubit Systems: A Tailored CNN-BiLSTM Approach [6.448866790627225]
We propose a hybrid neural network architecture integrating Convolutional and Bidirectional Long Short-Term Memory networks (CNN-BiLSTM)<n>This design leverages CNNs for local feature extraction and BiLSTMs for sequential dependency modeling, enabling robust feature learning from minimal training data.<n>When trained on only 100 samples, Architecture 2 maintains classification accuracies exceeding 90% for both 3-qubit and 4-qubit systems, demonstrating rapid loss within tens of epochs.
arXiv Detail & Related papers (2026-01-30T04:59:44Z)
Real-Time American Sign Language Recognition Using 3D Convolutional Neural Networks and LSTM: Architecture, Training, and Deployment [0.0]
This paper presents a real-time American Sign Language (ASL) recognition system utilizing a hybrid deep learning architecture.<n>The system processes webcam video streams to recognize word-level ASL signs, addressing communication barriers for over 70 million deaf and hard-of-hearing individuals worldwide.
arXiv Detail & Related papers (2025-12-19T00:17:43Z)
A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition [0.0]
We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language dataset.<n>Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy.
arXiv Detail & Related papers (2025-11-17T08:28:35Z)
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding [64.86119288520419]
multimodal language models struggle with spatial reasoning across time and space.<n>We present SIMS-V -- a systematic data-generation framework that leverages the privileged information of 3D simulators.<n>Our approach demonstrates robust generalization, maintaining performance on general video understanding while showing substantial improvements on embodied and real-world spatial tasks.
arXiv Detail & Related papers (2025-11-06T18:53:31Z)
Three-Class Text Sentiment Analysis Based on LSTM [0.0]
This paper introduces a three-class sentiment classification method for Weibo comments using Long Short-Term Memory (LSTM) networks.<n> Experimental results demonstrate superior performance, achieving an accuracy of 98.31% and an F1 score of 98.28%.
arXiv Detail & Related papers (2024-12-23T07:21:07Z)
Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models. This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z)
Hybrid CNN Bi-LSTM neural network for Hyperspectral image classification [1.2691047660244332]
This paper proposes a neural network combining 3-D CNN, 2-D CNN and Bi-LSTM. It could achieve 99.83, 99.98 and 100 percent accuracy using only 30 percent trainable parameters of the state-of-art model in IP, PU and SA datasets respectively.
arXiv Detail & Related papers (2024-02-15T15:46:13Z)
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning [59.26623999209235]
We present DiST, which disentangles the learning of spatial and temporal aspects of videos. The disentangled learning in DiST is highly efficient because it avoids the back-propagation of massive pre-trained parameters. Extensive experiments on five benchmarks show that DiST delivers better performance than existing state-of-the-art methods by convincing gaps.
arXiv Detail & Related papers (2023-09-14T17:58:33Z)
Audio-Visual Efficient Conformer for Robust Speech Recognition [91.3755431537592]
We propose to improve the noise of the recently proposed Efficient Conformer Connectionist Temporal Classification architecture by processing both audio and visual modalities. Our experiments show that using audio and visual modalities allows to better recognize speech in the presence of environmental noise and significantly accelerate training, reaching lower WER with 4 times less training steps.
arXiv Detail & Related papers (2023-01-04T05:36:56Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Decoding ECoG signal into 3D hand translation using deep learning [3.20238141000059]
Motor brain-computer interfaces (BCIs) are promising technology that may enable motor-impaired people to interact with their environment. Most ECoG signal decoders used to predict continuous hand movements are linear models. Deep learning models, which are state-of-the-art in many problems, could be a solution to better capture this relationship.
arXiv Detail & Related papers (2021-10-05T15:41:04Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Drowsiness Detection Based On Driver Temporal Behavior Using a New Developed Dataset [1.8811803364757564]
We apply YOLOv3 (You Look Only Once-version3) CNN for extracting facial features automatically. Then, LSTM neural network is employed to learn driver temporal behaviors including yawning and blinking time period. Results indicate the hybrid of CNN and LSTM ability in drowsiness detection and the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-03-31T21:15:29Z)
A journey in ESN and LSTM visualisations on a language task [77.34726150561087]
We trained ESNs and LSTMs on a Cross-Situationnal Learning (CSL) task. The results are of three kinds: performance comparison, internal dynamics analyses and visualization of latent space.
arXiv Detail & Related papers (2020-12-03T08:32:01Z)
Exploiting the ConvLSTM: Human Action Recognition using Raw Depth Video-Based Recurrent Neural Networks [0.0]
We propose and compare two neural networks based on the convolutional long short-term memory unit, namely ConvLSTM. We show that the proposed models achieve competitive recognition accuracies with lower computational cost compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T23:35:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.