Related papers: A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

URL: http://arxiv.org/abs/2511.13126v1
Date: Mon, 17 Nov 2025 08:28:35 GMT
Title: A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition
Authors: Nigar Alishzade, Gulchin Abdullayeva,
Abstract summary: We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language dataset.<n>Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and temporal modeling. The study provides a nuanced analysis of these trade-offs, offering guidance for architecture selection in sign language recognition systems depending on application requirements and resource constraints.

Related papers

AI Generated Text Detection [0.0]
This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures.<n>We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage.<n>Results indicate that contextual modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization.
arXiv Detail & Related papers (2026-01-07T11:18:10Z)
Real-Time Sign Language to text Translation using Deep Learning: A Comparative study of LSTM and 3D CNN [0.0]
This study investigates the performance of 3D Contemporalal Neural Networks (3D CNNs) and Long Short-Term Memory (LSTM) networks for real-time American Sign Language (ASL)<n> Experimental results demonstrate that 3D CNN achieve 92.4% recognition accuracy but require 3.2% more processing time per frame compared to LSTMs, which maintain 86.7% accuracy with significantly lower resource consumption.<n>This project provides professional benchmarks for developing assistive technologies, highlighting trade-offs between recognition precision and real-time operational requirements in edge computing environments.
arXiv Detail & Related papers (2025-10-15T04:26:33Z)
ShishuLM: Lightweight Language Model with Hybrid Decoder-MLP Architecture and Paired Weight Sharing [0.5565728870245015]
We introduce an efficient language model architecture, referred to as ShishuLM, which reduces both the parameter count and Key-Value (KV) cache requirements.<n>Our results show that ShishuLM provides up to 25% reduction in memory requirements and up to 40% improvement in latency during both training and inference, compared to parent models.
arXiv Detail & Related papers (2025-10-13T04:04:54Z)
Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model [56.573203512455706]
Large-scale vision-language models (VLMs) have achieved remarkable success in zero-shot learning (ZSL) by leveraging large-scale visual-text pair datasets.<n>One approach to address this issue is to develop interpretable models by integrating language.<n>We propose LaZSL, a locally-aligned vision-language model for interpretable ZSL.
arXiv Detail & Related papers (2025-06-30T13:14:46Z)
SignBart -- New approach with the skeleton sequence for Isolated Sign language Recognition [0.17578923069457017]
This study presents a new novel SLR approach that overcomes the challenge of independently extracting meaningful information from the x and y coordinates of skeleton sequences.<n>With only 749,888 parameters, the model achieves 96.04% accuracy on the LSA-64 dataset.<n>The model also demonstrates excellent performance and generalization across WLASL and ASL-Citizen datasets.
arXiv Detail & Related papers (2025-06-18T07:07:36Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search [53.40810298627443]
ReGUIDE is a framework for web grounding that enables MLLMs to learn data efficiently through self-generated reasoning and spatial-aware criticism.<n>Our experiments demonstrate that ReGUIDE significantly advances web grounding performance across multiple benchmarks.
arXiv Detail & Related papers (2025-05-21T08:36:18Z)
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models [86.83875864328984]
We propose an automated method for synthesizing open-ended logic puzzles, and use it to develop a bilingual benchmark, AutoLogi.<n>Our approach features program-based verification and controllable difficulty levels, enabling more reliable evaluation that better distinguishes models' reasoning abilities.
arXiv Detail & Related papers (2025-02-24T07:02:31Z)
Latent Thought Models with Variational Bayes Inference-Time Computation [52.63299874322121]
Latent Thought Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.<n>LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
Training Strategies for Isolated Sign Language Recognition [72.27323884094953]
This paper introduces a comprehensive model training pipeline for Isolated Sign Language Recognition.<n>The constructed pipeline incorporates carefully selected image and video augmentations to tackle the challenges of low data quality and varying sign speeds.
arXiv Detail & Related papers (2024-12-16T08:37:58Z)
Attention vs LSTM: Improving Word-level BISINDO Recognition [0.0]
Indonesia ranks fourth globally in the number of deaf cases.<n>Individuals with hearing impairments often find communication challenging, necessitating the use of sign language.<n>This study aims to explore the application of AI in developing models for a simplified sign language translation app and dictionary.
arXiv Detail & Related papers (2024-09-03T15:17:39Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.