Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
Speech Recognition
- URL: http://arxiv.org/abs/2201.11571v1
- Date: Thu, 27 Jan 2022 15:22:09 GMT
- Title: Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
Speech Recognition
- Authors: Mohammad Soleymanpour, Michael T. Johnson, Rahim Soleymanpour, Jeffrey
Berry
- Abstract summary: Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
To have robust dysarthria-specific ASR, sufficient training speech is required.
Recent advances in Text-To-Speech synthesis suggest the possibility of using synthesis for data augmentation.
- Score: 4.637732011720613
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dysarthria is a motor speech disorder often characterized by reduced speech
intelligibility through slow, uncoordinated control of speech production
muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers
communicate more effectively. To have robust dysarthria-specific ASR,
sufficient training speech is required, which is not readily available. Recent
advances in Text-To-Speech (TTS) synthesis multi-speaker end-to-end TTS systems
suggest the possibility of using synthesis for data augmentation. In this
paper, we aim to improve multi-speaker end-to-end TTS systems to synthesize
dysarthric speech for improved training of a dysarthria-specific DNN-HMM ASR.
In the synthesized speech, we add dysarthria severity level and pause insertion
mechanisms to other control parameters such as pitch, energy, and duration.
Results show that a DNN-HMM model trained on additional synthetic dysarthric
speech achieves WER improvement of 12.2% compared to the baseline, the addition
of the severity level and pause insertion controls decrease WER by 6.5%,
showing the effectiveness of adding these parameters. Audio samples are
available at
Related papers
- Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition [71.87998918300806]
This paper explores approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems.
TDNN systems constructed by integrating domain-adapted HuBERT, wav2vec2-conformer or multi-lingual XLSR models consistently outperform standalone fine-tuned SSL pre-trained models.
Consistent improvements in Alzheimer's Disease detection accuracy are also obtained using the DementiaBank Pitt elderly speech recognition outputs.
arXiv Detail & Related papers (2024-07-03T08:33:39Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders.
SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication.
Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z) - Enhancing Speech Intelligibility in Text-To-Speech Synthesis using
Speaking Style Conversion [17.520533341887642]
We propose a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis.
The proposed speech system exploits two modification strategies: (a) Lombard speaking style data and (b) Spectral Shaping and Dynamic Range Compression (SSDRC)
Intelligibility enhancement as quantified by the Intelligibility in Bits measure shows that the proposed Lombard-SSDRC TTS system shows significant relative improvement between 110% and 130% in speech-shaped noise (SSN) and 47% to 140% in competing-speaker noise (CSN)
arXiv Detail & Related papers (2020-08-13T10:51:56Z) - Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training [4.050982413149992]
The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN.
The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
arXiv Detail & Related papers (2020-01-10T01:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.