Related papers: Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation

URL: http://arxiv.org/abs/2306.04368v1
Date: Wed, 7 Jun 2023 12:01:46 GMT
Title: Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation
Authors: Massa Baali, Ibrahim Almakky, Shady Shehata, Fakhri Karray
Abstract summary: We aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset.
Score: 4.874780144224057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite major advancements in Automatic Speech Recognition (ASR), the state-of-the-art ASR systems struggle to deal with impaired speech even with high-resource languages. In Arabic, this challenge gets amplified, with added complexities in collecting data from dysarthric speakers. In this paper, we aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach. To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech by modifying its speed and tempo. We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset to capture language-independant dysarthric speech patterns and further augment the signal-adjusted speech samples. Furthermore, we propose a fine-tuning and text-correction strategies for Arabic Conformer at different dysarthric speech severity levels. Our fine-tuned Conformer achieved 18% Word Error Rate (WER) and 17.2% Character Error Rate (CER) on synthetically generated dysarthric speech from the Arabic commonvoice speech dataset. This shows significant WER improvement of 81.8% compared to the baseline model trained solely on healthy data. We perform further validation on real English dysarthric speech showing a WER improvement of 124% compared to the baseline trained only on healthy English LJSpeech dataset.

Related papers

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages [32.61962553268565]
We fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions.<n>We then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech.<n>The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingually Speech (MMS)
arXiv Detail & Related papers (2025-05-20T20:03:45Z)
Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems. TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models. Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z)
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z)
Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z)
Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages. Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z)
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z)
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition [15.136348385992047]
This study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. We train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance.
arXiv Detail & Related papers (2022-04-04T17:36:01Z)
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech. Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Investigation of Data Augmentation Techniques for Disordered Speech Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition. Both normal and disordered speech were exploited in the augmentation process. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training [4.050982413149992]
The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
arXiv Detail & Related papers (2020-01-10T01:40:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.