Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of
Speech Sound Disorders in Korean children
- URL: http://arxiv.org/abs/2403.08187v1
- Date: Wed, 13 Mar 2024 02:20:05 GMT
- Title: Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of
Speech Sound Disorders in Korean children
- Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang,
Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang and Hosung
Nam
- Abstract summary: This study presents a model of automatic speech recognition designed to diagnose pronunciation issues in children with speech sound disorders.
The model's predictions of the pronunciations of the words matched the human annotations with about 90% accuracy.
- Score: 4.840474991678558
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study presents a model of automatic speech recognition (ASR) designed to
diagnose pronunciation issues in children with speech sound disorders (SSDs) to
replace manual transcriptions in clinical procedures. Since ASR models trained
for general purposes primarily predict input speech into real words, employing
a well-known high-performance ASR model for evaluating pronunciation in
children with SSDs is impractical. We fine-tuned the wav2vec 2.0 XLS-R model to
recognize speech as pronounced rather than as existing words. The model was
fine-tuned with a speech dataset from 137 children with inadequate speech
production pronouncing 73 Korean words selected for actual clinical diagnosis.
The model's predictions of the pronunciations of the words matched the human
annotations with about 90% accuracy. While the model still requires improvement
in recognizing unclear pronunciation, this study demonstrates that ASR models
can streamline complex pronunciation error diagnostic procedures in clinical
fields.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness.
We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets.
Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z) - Cross-lingual Self-Supervised Speech Representations for Improved
Dysarthric Speech Recognition [15.136348385992047]
This study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.
We train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model.
Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance.
arXiv Detail & Related papers (2022-04-04T17:36:01Z) - Automatic Speech recognition for Speech Assessment of Preschool Children [4.554894288663752]
The acoustic and linguistic features of preschool speech are investigated in this study.
Wav2Vec 2.0 is a paradigm that could be used to build a robust end-to-end speech recognition system.
arXiv Detail & Related papers (2022-03-24T07:15:24Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Influence of ASR and Language Model on Alzheimer's Disease Detection [2.4698886064068555]
We analyse the usage of a SotA ASR system to transcribe participant's spoken descriptions from a picture.
We study the influence of a language model -- which tends to correct non-standard sequences of words -- with the lack of language model to decode the hypothesis from the ASR.
The proposed system combines acoustic -- based on prosody and voice quality -- and lexical features based on the first occurrence of the most common words.
arXiv Detail & Related papers (2021-09-20T10:41:39Z) - Experiments of ASR-based mispronunciation detection for children and
adult English learners [7.083737676329174]
We develop a mispronunciation assessment system that checks the pronunciation of non-native English speakers.
We present an evaluation of the non-native pronunciation observed in phonetically annotated speech corpora.
arXiv Detail & Related papers (2021-04-13T07:24:05Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - Data augmentation using prosody and false starts to recognize non-native
children's speech [12.911954427107977]
This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition.
The task is to recognize non-native speech from children of various age groups given a limited amount of speech.
arXiv Detail & Related papers (2020-08-29T05:32:32Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.