Related papers: Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings

Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings

URL: http://arxiv.org/abs/2506.21386v1
Date: Thu, 26 Jun 2025 15:36:25 GMT
Title: Hybrid Deep Learning and Signal Processing for Arabic Dialect Recognition in Low-Resource Settings
Authors: Ghazal Al-Shwayyat, Omer Nezih Gerek,
Abstract summary: Arabic dialect recognition presents a significant challenge due to the linguistic diversity of Arabic and the scarcity of large annotated datasets.<n>This research investigates hybrid modeling strategies that integrate classical signal processing techniques with deep learning architectures.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Arabic dialect recognition presents a significant challenge in speech technology due to the linguistic diversity of Arabic and the scarcity of large annotated datasets, particularly for underrepresented dialects. This research investigates hybrid modeling strategies that integrate classical signal processing techniques with deep learning architectures to address this problem in low-resource scenarios. Two hybrid models were developed and evaluated: (1) Mel-Frequency Cepstral Coefficients (MFCC) combined with a Convolutional Neural Network (CNN), and (2) Discrete Wavelet Transform (DWT) features combined with a Recurrent Neural Network (RNN). The models were trained on a dialect-filtered subset of the Common Voice Arabic dataset, with dialect labels assigned based on speaker metadata. Experimental results demonstrate that the MFCC + CNN architecture achieved superior performance, with an accuracy of 91.2% and strong precision, recall, and F1-scores, significantly outperforming the Wavelet + RNN configuration, which achieved an accuracy of 66.5%. These findings highlight the effectiveness of leveraging spectral features with convolutional models for Arabic dialect recognition, especially when working with limited labeled data. The study also identifies limitations related to dataset size, potential regional overlaps in labeling, and model optimization, providing a roadmap for future research. Recommendations for further improvement include the adoption of larger annotated corpora, integration of self-supervised learning techniques, and exploration of advanced neural architectures such as Transformers. Overall, this research establishes a strong baseline for future developments in Arabic dialect recognition within resource-constrained environments.

Related papers

SignBart -- New approach with the skeleton sequence for Isolated Sign language Recognition [0.17578923069457017]
This study presents a new novel SLR approach that overcomes the challenge of independently extracting meaningful information from the x and y coordinates of skeleton sequences.<n>With only 749,888 parameters, the model achieves 96.04% accuracy on the LSA-64 dataset.<n>The model also demonstrates excellent performance and generalization across WLASL and ASL-Citizen datasets.
arXiv Detail & Related papers (2025-06-18T07:07:36Z)
Tracking Articulatory Dynamics in Speech with a Fixed-Weight BiLSTM-CNN Architecture [0.0]
This paper presents a novel approach for predicting tongue and lip articulatory features involved in a given speech acoustics.<n>The proposed network is trained with two datasets consisting of simultaneously recorded speech and Electromagnetic Articulography (EMA) datasets.
arXiv Detail & Related papers (2025-04-25T05:57:22Z)
Revolutionizing Communication with Deep Learning and XAI for Enhanced Arabic Sign Language Recognition [0.0]
This study introduces an integrated approach to recognizing Arabic Sign Language (ArSL) using state-of-the-art deep learning models such as MobileNetV3, ResNet50, and EfficientNet-B2.<n>The proposed system not only sets new benchmarks in recognition accuracy but also emphasizes interpretability, making it suitable for applications in healthcare, education, and inclusive communication technologies.
arXiv Detail & Related papers (2025-01-14T14:49:49Z)
Resource-Aware Arabic LLM Creation: Model Adaptation, Integration, and Multi-Domain Testing [0.0]
This paper presents a novel approach to fine-tuning the Qwen2-1.5B model for Arabic language processing using Quantized Low-Rank Adaptation (QLoRA) on a system with only 4GB VRAM.<n>We detail the process of adapting this large language model to the Arabic domain, using diverse datasets including Bactrian, OpenAssistant, and Wikipedia Arabic corpora.<n> Experimental results over 10,000 training steps show significant performance improvements, with the final loss converging to 0.1083.
arXiv Detail & Related papers (2024-12-23T13:08:48Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
A Sequence-to-Sequence Approach for Arabic Pronoun Resolution [0.0]
This paper proposes a sequence-to-sequence learning approach for Arabic pronoun resolution. The proposed approach is evaluated on the AnATAr dataset.
arXiv Detail & Related papers (2023-05-19T08:53:41Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection. The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z)
Research on Dual Channel News Headline Classification Based on ERNIE Pre-training Model [13.222137788045416]
The proposed model improves the accuracy, precision and F1-score of news headline classification compared with the traditional neural network model. It can perform well in the multi-classification application of news headline text under large data volume.
arXiv Detail & Related papers (2022-02-14T10:44:12Z)
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods. We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments. The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction. It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition. We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z)
Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.