Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
- URL: http://arxiv.org/abs/2410.15500v1
- Date: Sun, 20 Oct 2024 20:40:56 GMT
- Title: Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
- Authors: Suhita Ghosh, Melanie Jouaiti, Arnab Das, Yamini Sinha, Tim Polzehl, Ingo Siegert, Sebastian Stober,
- Abstract summary: Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content.
Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains.
We propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example.
- Score: 4.42160195007899
- License:
- Abstract: Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.
Related papers
- Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses [0.08155575318208629]
Speech anonymization needs to obscure a speaker's identity while retaining critical information for subsequent tasks.
Our research underscores the importance of loss functions inspired by the human auditory system.
Our proposed loss functions are model-agnostic, incorporating handcrafted and deep learning-based features to effectively capture quality representations.
arXiv Detail & Related papers (2024-10-20T20:33:44Z) - Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - On the Impact of Voice Anonymization on Speech Diagnostic Applications: a Case Study on COVID-19 Detection [13.227360396362707]
There is growing interest in voice anonymization to preserve speaker privacy and identity.
For affective computing and disease monitoring applications, however, the para-linguistic content may be more critical.
We test three anonymization methods and their impact on five different state-of-the-art COVID-19 diagnostic systems.
arXiv Detail & Related papers (2023-04-05T01:09:58Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Improving speaker de-identification with functional data analysis of f0
trajectories [10.809893662563926]
Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
arXiv Detail & Related papers (2022-03-31T01:34:15Z) - Unsupervised Domain Adaptation in Speech Recognition using Phonetic
Features [6.872447420442981]
We propose a technique to perform unsupervised gender-based domain adaptation in speech recognition using phonetic features.
Experiments are performed on the TIMIT dataset and there is a considerable decrease in the phoneme error rate using the proposed approach.
arXiv Detail & Related papers (2021-08-04T06:22:12Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z) - High Fidelity Speech Regeneration with Application to Speech Enhancement [96.34618212590301]
We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner.
Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source.
arXiv Detail & Related papers (2021-01-31T10:54:27Z) - DEAAN: Disentangled Embedding and Adversarial Adaptation Network for
Robust Speaker Representation Learning [69.70594547377283]
We propose a novel framework to disentangle speaker-related and domain-specific features.
Our framework can effectively generate more speaker-discriminative and domain-invariant speaker representations.
arXiv Detail & Related papers (2020-12-12T19:46:56Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.