Related papers: Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

URL: http://arxiv.org/abs/2508.05102v1
Date: Thu, 07 Aug 2025 07:39:48 GMT
Title: Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
Authors: Anuprabha M, Krishna Gurugubelli, Anil Kumar Vuppala,
Abstract summary: Dysarthric speech poses significant challenges in developing assistive technologies.<n>Recent advances in neural speech synthesis, especially zero-shot voice cloning, facilitate synthetic speech generation for data augmentation.<n>We investigate the effectiveness of state-of-the-art F5-TTS in cloning dysarthric speech using TORGO dataset.
Score: 10.019926246026928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dysarthric speech poses significant challenges in developing assistive technologies, primarily due to the limited availability of data. Recent advances in neural speech synthesis, especially zero-shot voice cloning, facilitate synthetic speech generation for data augmentation; however, they may introduce biases towards dysarthric speech. In this paper, we investigate the effectiveness of state-of-the-art F5-TTS in cloning dysarthric speech using TORGO dataset, focusing on intelligibility, speaker similarity, and prosody preservation. We also analyze potential biases using fairness metrics like Disparate Impact and Parity Difference to assess disparities across dysarthric severity levels. Results show that F5-TTS exhibits a strong bias toward speech intelligibility over speaker and prosody preservation in dysarthric speech synthesis. Insights from this study can help integrate fairness-aware dysarthric speech synthesis, fostering the advancement of more inclusive speech technologies.

Related papers

Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching [0.0]
Dysarthria is a neurological disorder that significantly impairs speech intelligibility.<n>This necessitates the development of robust dysarthric-to-regular speech conversion techniques.
arXiv Detail & Related papers (2025-06-19T08:24:17Z)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology [0.0]
This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria.<n>Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology.<n>We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices.
arXiv Detail & Related papers (2025-03-03T07:44:49Z)
Accurate synthesis of Dysarthric Speech for ASR data augmentation [5.223856537504927]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation.
arXiv Detail & Related papers (2023-08-16T15:42:24Z)
Assistive Completion of Agrammatic Aphasic Sentences: A Transfer Learning Approach using Neurolinguistics-based Synthetic Dataset [0.8831954614241233]
Damage to the inferior frontal gyrus can cause agrammatic aphasia. Patients, although able to comprehend, lack the ability to form complete sentences.
arXiv Detail & Related papers (2022-11-10T13:24:02Z)
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users. Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech. Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z)
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech. Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z)
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition [4.637732011720613]
Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility. To have robust dysarthria-specific ASR, sufficient training speech is required. Recent advances in Text-To-Speech synthesis suggest the possibility of using synthesis for data augmentation.
arXiv Detail & Related papers (2022-01-27T15:22:09Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Investigation of Data Augmentation Techniques for Disordered Speech Recognition [69.50670302435174]
This paper investigates a set of data augmentation techniques for disordered speech recognition. Both normal and disordered speech were exploited in the augmentation process. The final speaker adapted system constructed using the UASpeech corpus and the best augmentation approach based on speed perturbation produced up to 2.92% absolute word error rate (WER)
arXiv Detail & Related papers (2022-01-14T17:09:22Z)
A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion [50.040466658605524]
We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC) The poor quality of dysarthric speech can be greatly improved by statistical VC. But as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient.
arXiv Detail & Related papers (2021-06-02T18:41:03Z)
Silent Speech Interfaces for Speech Restoration: A Review [59.68902463890532]
Silent speech interface (SSI) research aims to provide alternative and augmentative communication methods for persons with severe speech disorders. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication. Most present-day SSIs have only been validated in laboratory settings for healthy users.
arXiv Detail & Related papers (2020-09-04T11:05:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.