The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for
Improved Dysarthric Speech Recognition
- URL: http://arxiv.org/abs/2201.04908v1
- Date: Thu, 13 Jan 2022 11:56:13 GMT
- Title: The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for
Improved Dysarthric Speech Recognition
- Authors: Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg
- Abstract summary: We investigate existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition.
We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods.
- Score: 24.07996218669781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we investigate several existing and a new state-of-the-art
generative adversarial network-based (GAN) voice conversion method for
enhancing dysarthric speech for improved dysarthric speech recognition. We
compare key components of existing methods as part of a rigorous ablation study
to find the most effective solution to improve dysarthric speech recognition.
We find that straightforward signal processing methods such as stationary noise
removal and vocoder-based time stretching lead to dysarthric speech recognition
results comparable to those obtained when using state-of-the-art GAN-based
voice conversion methods as measured using a phoneme recognition task.
Additionally, our proposed solution of a combination of MaskCycleGAN-VC and
time stretched enhancement is able to improve the phoneme recognition results
for certain dysarthric speakers compared to our time stretched baseline.
Related papers
- Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition [40.44769351506048]
Perceiver-Prompt is a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model.
We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs.
arXiv Detail & Related papers (2024-06-14T09:36:46Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility.
This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech
Recognition [14.544989316741091]
We propose a deep learning-based algorithm to improve the performance of automatic speech recognition systems for aphasia, apraxia, and dysarthria speech.
We demonstrate a significant decoding performance improvement by more than 50% during test time for isolated speech recognition task.
Results show the first step towards demonstrating the possibility of utilizing non-invasive neural signals to design a real-time robust speech prosthetic for stroke survivors recovering from aphasia, apraxia, and dysarthria.
arXiv Detail & Related papers (2021-02-28T03:27:02Z) - High Fidelity Speech Regeneration with Application to Speech Enhancement [96.34618212590301]
We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner.
Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source.
arXiv Detail & Related papers (2021-01-31T10:54:27Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z) - Improving Dysarthric Speech Intelligibility Using Cycle-consistent
Adversarial Training [4.050982413149992]
The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN.
The generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech.
arXiv Detail & Related papers (2020-01-10T01:40:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.