Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
Augmentation
- URL: http://arxiv.org/abs/2306.04368v1
- Date: Wed, 7 Jun 2023 12:01:46 GMT
- Title: Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
Augmentation
- Authors: Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray
- Abstract summary: We aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach.
To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech.
We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset.
- Score: 4.874780144224057
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite major advancements in Automatic Speech Recognition (ASR), the
state-of-the-art ASR systems struggle to deal with impaired speech even with
high-resource languages. In Arabic, this challenge gets amplified, with added
complexities in collecting data from dysarthric speakers. In this paper, we aim
to improve the performance of Arabic dysarthric automatic speech recognition
through a multi-stage augmentation approach. To this effect, we first propose a
signal-based approach to generate dysarthric Arabic speech from healthy Arabic
speech by modifying its speed and tempo. We also propose a second stage
Parallel Wave Generative (PWG) adversarial model that is trained on an English
dysarthric dataset to capture language-independant dysarthric speech patterns
and further augment the signal-adjusted speech samples. Furthermore, we propose
a fine-tuning and text-correction strategies for Arabic Conformer at different
dysarthric speech severity levels. Our fine-tuned Conformer achieved 18% Word
Error Rate (WER) and 17.2% Character Error Rate (CER) on synthetically
generated dysarthric speech from the Arabic commonvoice speech dataset. This
shows significant WER improvement of 81.8% compared to the baseline model
trained solely on healthy data. We perform further validation on real English
dysarthric speech showing a WER improvement of 124% compared to the baseline
trained only on healthy English LJSpeech dataset.
Related papers
- Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model.
We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z) - Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z) - Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC)
We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages.
Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Cross-lingual Self-Supervised Speech Representations for Improved
Dysarthric Speech Recognition [15.136348385992047]
This study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.
We train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model.
Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance.
arXiv Detail & Related papers (2022-04-04T17:36:01Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training [4.050982413149992]
The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN.
The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
arXiv Detail & Related papers (2020-01-10T01:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.