Related papers: CSLRConformer: A Data-Centric Conformer Approach for Continuous Arabic Sign Language Recognition on the Isharah Datase

CSLRConformer: A Data-Centric Conformer Approach for Continuous Arabic Sign Language Recognition on the Isharah Datase

URL: http://arxiv.org/abs/2508.01791v1
Date: Sun, 03 Aug 2025 14:58:50 GMT
Title: CSLRConformer: A Data-Centric Conformer Approach for Continuous Arabic Sign Language Recognition on the Isharah Datase
Authors: Fatimah Mohamed Emad Elden,
Abstract summary: This paper addresses the challenge of signer-independent recognition to advance the capabilities of Continuous Sign Language Recognition systems.<n>A data-centric methodology is proposed, centered on systematic feature engineering, a robust preprocessing pipeline, and an optimized model architecture.<n>The architecture adapts the hybrid CNN-Transformer design of the Conformer model, leveraging its capacity to model local temporal dependencies and global sequence context.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The field of Continuous Sign Language Recognition (CSLR) poses substantial technical challenges, including fluid inter-sign transitions, the absence of temporal boundaries, and co-articulation effects. This paper, developed for the MSLR 2025 Workshop Challenge at ICCV 2025, addresses the critical challenge of signer-independent recognition to advance the generalization capabilities of CSLR systems across diverse signers. A data-centric methodology is proposed, centered on systematic feature engineering, a robust preprocessing pipeline, and an optimized model architecture. Key contributions include a principled feature selection process guided by Exploratory Data Analysis (EDA) to isolate communicative keypoints, a rigorous preprocessing pipeline incorporating DBSCAN-based outlier filtering and spatial normalization, and the novel CSLRConformer architecture. This architecture adapts the hybrid CNN-Transformer design of the Conformer model, leveraging its capacity to model local temporal dependencies and global sequence context; a characteristic uniquely suited for the spatio-temporal dynamics of sign language. The proposed methodology achieved a competitive performance, with a Word Error Rate (WER) of 5.60% on the development set and 12.01% on the test set, a result that secured a 3rd place ranking on the official competition platform. This research validates the efficacy of cross-domain architectural adaptation, demonstrating that the Conformer model, originally conceived for speech recognition, can be successfully repurposed to establish a new state-of-the-art performance in keypoint-based CSLR.

Related papers

Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [53.16213723669751]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z)
DESign: Dynamic Context-Aware Convolution and Efficient Subnet Regularization for Continuous Sign Language Recognition [11.879737436401175]
We propose DESign, a novel framework that incorporates Dynamic Context-Aware Convolution (DCAC) and Subnet Regularization Connectionist Temporal Classification (SR-CTC)<n>DCAC dynamically captures the inter-frame motion cues that constitute signs and uniquely adapts convolutional weights based on contextual information.<n>SR-CTC regularizes training by applying supervision tovolutionworks, encouraging the model to explore diverse CTC alignment paths and effectively preventing overfitting.
arXiv Detail & Related papers (2025-07-04T06:56:28Z)
Modeling and Visualization Reasoning for Stakeholders in Education and Industry Integration Systems: Research on Structured Synthetic Dialogue Data Generation Based on NIST Standards [3.5516803380598074]
This study addresses the structural complexity and semantic ambiguity in stakeholder interactions within the Education-Industry Integration (EII) system.<n>We propose a structural modeling paradigm based on the National Institute of Standards and Technology (NIST) synthetic data quality framework.
arXiv Detail & Related papers (2025-06-20T12:37:43Z)
Neural Network Reprogrammability: A Unified Theme on Model Reprogramming, Prompt Tuning, and Prompt Instruction [55.914891182214475]
We introduce neural network reprogrammability as a unifying framework for model adaptation.<n>We present a taxonomy that categorizes such information manipulation approaches across four key dimensions.<n>We also analyze remaining technical challenges and ethical considerations.
arXiv Detail & Related papers (2025-06-05T05:42:27Z)
CSE-SFP: Enabling Unsupervised Sentence Representation Learning via a Single Forward Pass [3.0566617373924325]
Recent advances in pre-trained language models (PLMs) have driven remarkable progress in this field.<n>We propose CSE-SFP, an innovative method that exploits the structural characteristics of generative models.<n>We show that CSE-SFP not only produces higher-quality embeddings but also significantly reduces both training time and memory consumption.
arXiv Detail & Related papers (2025-05-01T08:27:14Z)
Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling [48.78361527873024]
We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies. We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
arXiv Detail & Related papers (2024-09-09T15:12:28Z)
Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining [0.6144680854063939]
State-of-the-art Conformer model for Speech Recognition is adapted for continuous sign language recognition. This marks the first instance of employing Conformer for a vision-based task. Unsupervised pretraining is conducted on a curated sign language dataset.
arXiv Detail & Related papers (2024-05-20T13:40:52Z)
A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR) The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.