Related papers: An End-to-End Approach for Child Reading Assessment in the Xhosa Language

An End-to-End Approach for Child Reading Assessment in the Xhosa Language

URL: http://arxiv.org/abs/2505.17371v2
Date: Mon, 02 Jun 2025 07:47:01 GMT
Title: An End-to-End Approach for Child Reading Assessment in the Xhosa Language
Authors: Sergio Chevtchenko, Nikhil Navas, Rafaella Vale, Franco Ubaudi, Sipumelele Lucwaba, Cally Ardington, Soheil Afshar, Mark Antoniou, Saeed Afshar,
Abstract summary: This study focuses on Xhosa, a language spoken in South Africa, to advance child speech recognition capabilities.<n>We present a novel dataset composed of child speech samples in Xhosa.<n>The results indicate that the performance of these models can be significantly influenced by the amount and balancing of the available training data.
Score: 0.3579433677269426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Child literacy is a strong predictor of life outcomes at the subsequent stages of an individual's life. This points to a need for targeted interventions in vulnerable low and middle income populations to help bridge the gap between literacy levels in these regions and high income ones. In this effort, reading assessments provide an important tool to measure the effectiveness of these programs and AI can be a reliable and economical tool to support educators with this task. Developing accurate automatic reading assessment systems for child speech in low-resource languages poses significant challenges due to limited data and the unique acoustic properties of children's voices. This study focuses on Xhosa, a language spoken in South Africa, to advance child speech recognition capabilities. We present a novel dataset composed of child speech samples in Xhosa. The dataset is available upon request and contains ten words and letters, which are part of the Early Grade Reading Assessment (EGRA) system. Each recording is labeled with an online and cost-effective approach by multiple markers and a subsample is validated by an independent EGRA reviewer. This dataset is evaluated with three fine-tuned state-of-the-art end-to-end models: wav2vec 2.0, HuBERT, and Whisper. The results indicate that the performance of these models can be significantly influenced by the amount and balancing of the available training data, which is fundamental for cost-effective large dataset collection. Furthermore, our experiments indicate that the wav2vec 2.0 performance is improved by training on multiple classes at a time, even when the number of available samples is constrained.

Related papers

Automated evaluation of children's speech fluency for low-resource languages [8.918459083715149]
This paper proposes a system to automatically assess fluency by combining a fine-tuned multilingual ASR model and an objective metrics extraction stage.<n>We evaluate the proposed system on a dataset of children's speech in two low-resource languages, Tamil and Malay.
arXiv Detail & Related papers (2025-05-26T08:25:50Z)
Automatic Proficiency Assessment in L2 English Learners [51.652753736780205]
Second language proficiency (L2) in English is usually perceptually evaluated by English teachers or expert evaluators.<n>This paper explores deep learning techniques for comprehensive L2 proficiency assessment, addressing both the speech signal and its correspondent transcription.
arXiv Detail & Related papers (2025-05-05T12:36:03Z)
Deep Learning for Assessment of Oral Reading Fluency [5.707725771108279]
This work investigates end-to-end modeling on a training dataset of children's audio recordings of story texts labeled by human experts. We report the performance of a number of system variations on the relevant measures, and probe the learned embeddings for lexical and acoustic-prosodic features known to be important to the perception of reading fluency.
arXiv Detail & Related papers (2024-05-29T18:09:35Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.<n>This survey delves into an important attribute of these datasets: the dialect of a language.<n>Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
On the effect of curriculum learning with developmental data for grammar acquisition [4.4044968357361745]
This work explores the degree to which grammar acquisition is driven by language simplicity' and the source modality (speech vs. text) of data. We find that grammar acquisition is largely driven by exposure to speech data, and in particular through exposure to two of the BabyLM training corpora: AO-Childes and Open Subtitles.
arXiv Detail & Related papers (2023-10-31T20:05:30Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text [4.863310073296471]
We propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions. We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking.
arXiv Detail & Related papers (2023-06-06T10:18:17Z)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels. We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels. We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z)
Analysing the Impact of Audio Quality on the Use of Naturalistic Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills. Recent developments have enabled the use of more naturalistic training data for computational models. It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z)
Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language. We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z)
Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit [3.7373314439051106]
Hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia. We introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall.
arXiv Detail & Related papers (2021-06-14T11:17:52Z)
Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis. We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.