Pathological voice adaptation with autoencoder-based voice conversion
- URL: http://arxiv.org/abs/2106.08427v1
- Date: Tue, 15 Jun 2021 20:38:10 GMT
- Title: Pathological voice adaptation with autoencoder-based voice conversion
- Authors: Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez,
Odette Scharenborg
- Abstract summary: Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics.
This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech.
- Score: 15.687800631199616
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a new approach to pathological speech synthesis.
Instead of using healthy speech as a source, we customise an existing
pathological speech sample to a new speaker's voice characteristics. This
approach alleviates the evaluation problem one normally has when converting
typical speech to pathological speech, as in our approach, the voice conversion
(VC) model does not need to be optimised for speech degradation but only for
the speaker change. This change in the optimisation ensures that any
degradation found in naturalness is due to the conversion process and not due
to the model exaggerating characteristics of a speech pathology. To show a
proof of concept of this method, we convert dysarthric speech using the
UASpeech database and an autoencoder-based VC technique. Subjective evaluation
results show reasonable naturalness for high intelligibility dysarthric
speakers, though lower intelligibility seems to introduce a marginal
degradation in naturalness scores for mid and low intelligibility speakers
compared to ground truth. Conversion of speaker characteristics for low and
high intelligibility speakers is successful, but not for mid. Whether the
differences in the results for the different intelligibility levels is due to
the intelligibility levels or due to the speakers needs to be further
investigated.
Related papers
- Self-supervised Speech Models for Word-Level Stuttered Speech Detection [66.46810024006712]
We introduce a word-level stuttering speech detection model leveraging self-supervised speech models.
Our evaluation demonstrates that our model surpasses previous approaches in word-level stuttering speech detection.
arXiv Detail & Related papers (2024-09-16T20:18:20Z) - We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings [47.2515056854372]
In speech synthesis, modeling of rich emotions and prosodic variations present in human voice are crucial to synthesize natural speech.
We propose a novel speaker embedding network which utilizes multiple class centers in the speaker classification training rather than a single class center as traditional embeddings.
arXiv Detail & Related papers (2024-07-05T06:54:24Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Towards Identity Preserving Normal to Dysarthric Voice Conversion [37.648612382457756]
We present a framework that converts normal speech into dysarthric speech while preserving the speaker identity.
This is essential for (1) clinical decision making processes and alleviation of patient stress, and (2) data augmentation for dysarthric speech recognition.
arXiv Detail & Related papers (2021-10-15T17:18:02Z) - Toward Degradation-Robust Voice Conversion [94.60503904292916]
Any-to-any voice conversion technologies convert the vocal timbre of an utterance to any speaker even unseen during training.
It is difficult to collect clean utterances of a speaker, and they are usually degraded by noises or reverberations.
We report in this paper the first comprehensive study on the degradation of robustness of any-to-any voice conversion.
arXiv Detail & Related papers (2021-10-14T17:00:34Z) - A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z) - Defending Your Voice: Adversarial Attack on Voice Conversion [70.19396655909455]
We report the first known attempt to perform adversarial attack on voice conversion.
We introduce human noise imperceptible into the utterances of a speaker whose voice is to be defended.
It was shown that the speaker characteristics of the converted utterances were made obviously different from those of the defended speaker.
arXiv Detail & Related papers (2020-05-18T14:51:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.