Fugu-MT 論文翻訳(概要): Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

論文の概要: Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

arxiv url: http://arxiv.org/abs/2603.20255v1
Date: Wed, 11 Mar 2026 08:03:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:12.940121
Title: Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education
Title（参考訳）: Abjad-Kids: 初等教育のためのアラビア語音声分類データセット
Authors: Abdul Aziz Snoubara, Baraa Al_Maradni, Haya Al_Naal, Malek Al_Madrmani, Roaa Jdini, Seedra Zarzour, Khloud Al Jallad,
Abstract要約: 本稿では,幼稚園・初等教育用に設計されたアラビア語音声データセットであるAbjad-Kidsについて述べる。データセットは、3歳から12歳までの子供から収集された4397のオーディオサンプルからなり、141のクラスをカバーしている。本稿では,CNN-LSTMアーキテクチャに基づく階層型音声分類を提案する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This paper presents Abjad-Kids, an Arabic speech dataset designed for kindergarten and primary education, focusing on fundamental learning of alphabets, numbers, and colors. The dataset consists of 46397 audio samples collected from children aged 3 - 12 years, covering 141 classes. All samples were recorded under controlled specifications to ensure consistency in duration, sampling rate, and format. To address high intra-class similarity among Arabic phonemes and the limited samples per class, we propose a hierarchical audio classification based on CNN-LSTM architectures. Our proposed methodology decomposes alphabet recognition into a two-stage process: an initial grouping classification model followed by specialized classifiers for each group. Both strategies: static linguistic-based grouping and dynamic clustering-based grouping, were evaluated. Experimental results demonstrate that static linguistic-based grouping achieves superior performance. Comparisons between traditional machine learning with deep learning approaches, highlight the effectiveness of CNN-LSTM models combined with data augmentation. Despite achieving promising results, most of our experiments indicate a challenge with overfitting, which is likely due to the limited number of samples, even after data augmentation and model regularization. Thus, future work may focus on collecting additional data to address this issue. Abjad-Kids will be publicly available. We hope that Abjad-Kids enrich children representation in speech dataset, and be a good resource for future research in Arabic speech classification for kids.
Abstract（参考訳）: 音声ベースのAI教育アプリケーションは近年、特に子供にとって大きな関心を集めている。しかし、アラビア語などの低リソース言語では、公的なデータセットが不足しているため、子どもの音声研究は限定的であり、幼稚園や初等教育用に設計されたアラビア語のデータセットであるAbjad-Kidsは、アルファベット、数字、色の基本学習に重点を置いている。データセットは、3歳から12歳までの子供から収集された4397のオーディオサンプルからなり、141のクラスをカバーしている。全てのサンプルは、持続時間、サンプリングレート、フォーマットの一貫性を確保するために、制御された仕様の下で記録された。 CNN-LSTMアーキテクチャに基づく階層型音声分類を提案する。提案手法は,アルファベット認識を2段階のプロセスに分解する。静的言語に基づくグループ化と動的クラスタリングに基づくグループ化という2つの戦略が評価された。静的言語に基づくグループ化が優れた性能を発揮することを示す実験結果が得られた。従来の機械学習とディープラーニングのアプローチの比較は、CNN-LSTMモデルとデータ拡張の併用の有効性を強調している。有望な結果を得たにもかかわらず、我々の実験のほとんどは、データ拡張やモデル正規化の後にも、サンプルの数が限られているため、オーバーフィッティングの課題を示している。したがって、今後の作業は、この問題に対処するための追加データ収集に集中するかもしれない。 Abjad-Kidsは一般公開される。我々は、Abjad-Kidsが音声データセットにおける子どもの表現を豊かにし、将来のアラビア語音声分類研究のための良い情報源になることを願っている。

論文の概要: Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

関連論文リスト