Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation
- URL: http://arxiv.org/abs/2202.09082v1
- Date: Fri, 18 Feb 2022 08:59:36 GMT
- Title: Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation
- Authors: Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu,
Helen Meng
- Abstract summary: Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
- Score: 59.41186714127256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dysarthric speech reconstruction (DSR), which aims to improve the quality of
dysarthric speech, remains a challenge, not only because we need to restore the
speech to be normal, but also must preserve the speaker's identity. The speaker
representation extracted by the speaker encoder (SE) optimized for speaker
verification has been explored to control the speaker identity. However, the SE
may not be able to fully capture the characteristics of dysarthric speakers
that are previously unseen. To address this research problem, we propose a
novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA).
The primary task of ASA fine-tunes the SE with the speech of the target
dysarthric speaker to effectively capture identity-related information, and the
secondary task applies adversarial training to avoid the incorporation of
abnormal speaking patterns into the reconstructed speech, by regularizing the
distribution of reconstructed speech to be close to that of reference speech
with high quality. Experiments show that the proposed approach can achieve
enhanced speaker similarity and comparable speech naturalness with a strong
baseline approach. Compared with dysarthric speech, the reconstructed speech
achieves 22.3% and 31.5% absolute word error rate reduction for speakers with
moderate and moderate-severe dysarthria respectively. Our demo page is released
here: https://wendison.github.io/ASA-DSR-demo/
Related papers
- Use of Speech Impairment Severity for Dysarthric Speech Recognition [37.93801885333925]
This paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognition.
Experiments conducted on UASpeech suggest incorporating speech impairment severity into state-of-the-art hybrid DNN, E2E Conformer and pre-trained Wav2vec 2.0 ASR systems.
arXiv Detail & Related papers (2023-05-18T02:42:59Z) - Cross-lingual Self-Supervised Speech Representations for Improved
Dysarthric Speech Recognition [15.136348385992047]
This study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech.
We train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model.
Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance.
arXiv Detail & Related papers (2022-04-04T17:36:01Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Comparing Supervised Models And Learned Speech Representations For
Classifying Intelligibility Of Disordered Speech On Selected Phrases [11.3463024120429]
We develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases.
We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases.
arXiv Detail & Related papers (2021-07-08T17:24:25Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z) - Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.