Accurate synthesis of Dysarthric Speech for ASR data augmentation
- URL: http://arxiv.org/abs/2308.08438v1
- Date: Wed, 16 Aug 2023 15:42:24 GMT
- Title: Accurate synthesis of Dysarthric Speech for ASR data augmentation
- Authors: Mohammad Soleymanpour, Michael T. Johnson, Rahim Soleymanpour, Jeffrey
Berry
- Abstract summary: Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
- Score: 5.223856537504927
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dysarthria is a motor speech disorder often characterized by reduced speech
intelligibility through slow, uncoordinated control of speech production
muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers
communicate more effectively. However, robust dysarthria-specific ASR requires
a significant amount of training speech, which is not readily available for
dysarthric talkers. This paper presents a new dysarthric speech synthesis
method for the purpose of ASR training data augmentation. Differences in
prosodic and acoustic characteristics of dysarthric spontaneous speech at
varying severity levels are important components for dysarthric speech
modeling, synthesis, and augmentation. For dysarthric speech synthesis, a
modified neural multi-talker TTS is implemented by adding a dysarthria severity
level coefficient and a pause insertion model to synthesize dysarthric speech
for varying severity levels. To evaluate the effectiveness for synthesis of
training data for ASR, dysarthria-specific speech recognition was used. Results
show that a DNN-HMM model trained on additional synthetic dysarthric speech
achieves WER improvement of 12.2% compared to the baseline, and that the
addition of the severity level and pause insertion controls decrease WER by
6.5%, showing the effectiveness of adding these parameters. Overall results on
the TORGO database demonstrate that using dysarthric synthetic speech to
increase the amount of dysarthric-patterned speech for training has significant
impact on the dysarthric ASR systems. In addition, we have conducted a
subjective evaluation to evaluate the dysarthric-ness and similarity of
synthesized speech. Our subjective evaluation shows that the perceived
dysartrhic-ness of synthesized speech is similar to that of true dysarthric
speech, especially for higher levels of dysarthria
Related papers
- Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model.
We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - Personalized Adversarial Data Augmentation for Dysarthric and Elderly
Speech Recognition [30.885165674448352]
This paper presents a novel set of speaker dependent (GAN) based data augmentation approaches for elderly and dysarthric speech recognition.
GAN based data augmentation approaches consistently outperform the baseline speed perturbation method by up to 0.91% and 3.0% absolute.
Consistent performance improvements are retained after applying LHUC based speaker adaptation.
arXiv Detail & Related papers (2022-05-13T04:29:49Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
Speech Recognition [4.637732011720613]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
To have robust dysarthria-specific ASR, sufficient training speech is required.
Recent advances in Text-To-Speech synthesis suggest the possibility of using synthesis for data augmentation.
arXiv Detail & Related papers (2022-01-27T15:22:09Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training [4.050982413149992]
The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN.
The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
arXiv Detail & Related papers (2020-01-10T01:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.