Related papers: Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language

Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language

URL: http://arxiv.org/abs/2602.05406v1
Date: Thu, 05 Feb 2026 07:44:13 GMT
Title: Enabling Automatic Disordered Speech Recognition: An Impaired Speech Dataset in the Akan Language
Authors: Isaac Wiafe, Akon Obu Ekpezu, Sumaya Ahmed Salihs, Elikem Doe Atsakpo, Fiifi Baffoe Payin Winful, Jamal-Deen Abdulai,
Abstract summary: This study presents a curated corpus of speech samples from native Akan speakers with speech impairment.<n>The dataset comprises of 50.01 hours of audio recordings cutting across four classes of impaired speech namely stammering, cerebral palsy, cleft palate, and stroke induced speech disorder.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The lack of impaired speech data hinders advancements in the development of inclusive speech technologies, particularly in low-resource languages such as Akan. To address this gap, this study presents a curated corpus of speech samples from native Akan speakers with speech impairment. The dataset comprises of 50.01 hours of audio recordings cutting across four classes of impaired speech namely stammering, cerebral palsy, cleft palate, and stroke induced speech disorder. Recordings were done in controlled supervised environments were participants described pre-selected images in their own words. The resulting dataset is a collection of audio recordings, transcriptions, and associated metadata on speaker demographics, class of impairment, recording environment and device. The dataset is intended to support research in low-resource automatic disordered speech recognition systems and assistive speech technology.

Related papers

iMiGUE-Speech: A Spontaneous Speech Dataset for Affective Analysis [7.298729249943839]
iMiGUE-Speech is an extension of the iMiGUE dataset that provides a spontaneous affective corpus for studying emotional and affective states.<n>iMiGUE-Speech captures spontaneous affect arising naturally from real match outcomes.
arXiv Detail & Related papers (2026-02-25T00:38:19Z)
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition [8.838919369202525]
Speech impairments resulting from congenital disorders present major challenges to automatic speech recognition systems.<n>State-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability.<n>This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning.
arXiv Detail & Related papers (2025-09-23T13:44:58Z)
Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech [0.562479170374811]
Speech impairments caused by conditions such as cerebral palsy or genetic disorders pose significant challenges for automatic speech recognition systems.<n>We propose a practical and lightweight pipeline to personalize ASR models, formalizing the selection of words and enriching a small, speech-impaired dataset with semantic coherence.<n>Our approach shows promising improvements in transcription quality, demonstrating the potential to reduce communication barriers for individuals with atypical speech patterns.
arXiv Detail & Related papers (2025-06-23T15:30:50Z)
Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages [49.31519786009296]
We fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions.<n>We then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech.<n>The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingually Speech (MMS)
arXiv Detail & Related papers (2025-05-20T20:03:45Z)
Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models. Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z)
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation [83.29199726650899]
The EARS dataset comprises 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics.
arXiv Detail & Related papers (2024-06-10T11:28:29Z)
RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain [20.07933161385449]
Speech recognition is still difficult in noisy and reverberant environments. We have created and made publicly available a German speech dataset called RescueSpeech. Our study highlights that the performance attained by state-of-the-art methods in this challenging scenario is still far from reaching an acceptable level.
arXiv Detail & Related papers (2023-06-06T23:04:22Z)
Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings. Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z)
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices [15.136348385992047]
We train several voice conversion models using self-supervised speech representations. Converted voices retain a low word error rate within 1% of the original voice. Experiments on dysarthric speech data show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices.
arXiv Detail & Related papers (2022-04-04T17:48:01Z)
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis [67.73554826428762]
We propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR. Our approach leverages audio-visual speech cues to generate the codes of a neural speech, enabling efficient synthesis of clean, realistic speech from noisy signals.
arXiv Detail & Related papers (2022-03-31T17:57:10Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.