Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training
- URL: http://arxiv.org/abs/2001.04260v1
- Date: Fri, 10 Jan 2020 01:40:27 GMT
- Title: Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training
- Authors: Seung Hee Yang, Minhwa Chung
- Abstract summary: The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN.
The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
- Score: 4.050982413149992
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dysarthria is a motor speech impairment affecting millions of people.
Dysarthric speech can be far less intelligible than those of non-dysarthric
speakers, causing significant communication difficulties. The goal of our work
is to develop a model for dysarthric to healthy speech conversion using
Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean
utterances that were recorded for the purpose of automatic recognition of voice
keyboard in a previous study, the generator is trained to transform dysarthric
to healthy speech in the spectral domain, which is then converted back to
speech. Objective evaluation using automatic speech recognition of the
generated utterance on a held-out test set shows that the recognition
performance is improved compared with the original dysarthic speech after
performing adversarial training, as the absolute WER has been lowered by 33.4%.
It demonstrates that the proposed GAN-based conversion method is useful for
improving dysarthric speech intelligibility.
Related papers
- Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model.
We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z) - Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
Augmentation [4.874780144224057]
We aim to improve the performance of Arabic dysarthric automatic speech recognition through a multi-stage augmentation approach.
To this effect, we first propose a signal-based approach to generate dysarthric Arabic speech from healthy Arabic speech.
We also propose a second stage Parallel Wave Generative (PWG) adversarial model that is trained on an English dysarthric dataset.
arXiv Detail & Related papers (2023-06-07T12:01:46Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Spectro-Temporal Deep Features for Disordered Speech Assessment and
Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed.
Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z) - The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for
Improved Dysarthric Speech Recognition [24.07996218669781]
We investigate existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.
We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods.
arXiv Detail & Related papers (2022-01-13T11:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.