A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion
- URL: http://arxiv.org/abs/2106.01415v1
- Date: Wed, 2 Jun 2021 18:41:03 GMT
- Title: A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker
Identity in Dysarthric Voice Conversion
- Authors: Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu
Tsao, Hsin-Min Wang, Tomoki Toda
- Abstract summary: We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC)
The poor quality of dysarthric speech can be greatly improved by statistical VC.
But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
- Score: 50.040466658605524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new paradigm for maintaining speaker identity in dysarthric
voice conversion (DVC). The poor quality of dysarthric speech can be greatly
improved by statistical VC, but as the normal speech utterances of a dysarthria
patient are nearly impossible to collect, previous work failed to recover the
individuality of the patient. In light of this, we suggest a novel, two-stage
approach for DVC, which is highly flexible in that no normal speech of the
patient is required. First, a powerful parallel sequence-to-sequence model
converts the input dysarthric speech into a normal speech of a reference
speaker as an intermediate product, and a nonparallel, frame-wise VC model
realized with a variational autoencoder then converts the speaker identity of
the reference speech back to that of the patient while assumed to be capable of
preserving the enhanced quality. We investigate several design options.
Experimental evaluation results demonstrate the potential of our approach to
improving the quality of the dysarthric speech while maintaining the speaker
identity.
Related papers
- Speaker-Independent Dysarthria Severity Classification using
Self-Supervised Transformers and Multi-Task Learning [2.7706924578324665]
This study presents a transformer-based framework for automatically assessing dysarthria severity from raw speech data.
We develop a framework, called Speaker-Agnostic Latent Regularisation (SALR), incorporating a multi-task learning objective and contrastive learning for speaker-independent multi-class dysarthria severity classification.
Our model demonstrated superior performance over traditional machine learning approaches, with an accuracy of $70.48%$ and an F1 score of $59.23%$.
arXiv Detail & Related papers (2024-02-29T18:30:52Z) - Use of Speech Impairment Severity for Dysarthric Speech Recognition [37.93801885333925]
This paper proposes a novel set of techniques to use both severity and speaker-identity in dysarthric speech recognition.
Experiments conducted on UASpeech suggest incorporating speech impairment severity into state-of-the-art hybrid DNN, E2E Conformer and pre-trained Wav2vec 2.0 ASR systems.
arXiv Detail & Related papers (2023-05-18T02:42:59Z) - On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and
Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods.
This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z) - Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric
and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users.
Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech.
Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Investigation of Data Augmentation Techniques for Disordered Speech
Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition.
Both normal and disordered speech were exploited in the augmentation process.
The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z) - Towards Identity Preserving Normal to Dysarthric Voice Conversion [37.648612382457756]
We present a framework that converts normal speech into dysarthric speech while preserving the speaker identity.
This is essential for (1) clinical decision making processes and alleviation of patient stress, and (2) data augmentation for dysarthric speech recognition.
arXiv Detail & Related papers (2021-10-15T17:18:02Z) - Pathological voice adaptation with autoencoder-based voice conversion [15.687800631199616]
Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics.
This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech.
arXiv Detail & Related papers (2021-06-15T20:38:10Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.