Related papers: Evaluation of Speaker Anonymization on Emotional Speech

Evaluation of Speaker Anonymization on Emotional Speech

URL: http://arxiv.org/abs/2305.01759v1
Date: Sat, 15 Apr 2023 20:50:29 GMT
Title: Evaluation of Speaker Anonymization on Emotional Speech
Authors: Hubert Nourtel, Pierre Champion, Denis Jouvet, Anthony Larcher, Marie Tahon
Abstract summary: Speech data carries a range of personal information, such as the speaker's identity and emotional state. Current studies have addressed the topic of preserving speech privacy. The VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization.
Score: 9.223908421919733
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Speech data carries a range of personal information, such as the speaker's identity and emotional state. These attributes can be used for malicious purposes. With the development of virtual assistants, a new generation of privacy threats has emerged. Current studies have addressed the topic of preserving speech privacy. One of them, the VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology. The task selected for the VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization. The goal is to hide the source speaker's identity while preserving the linguistic information. The baseline of the VPC makes use of a voice conversion. This paper studies the impact of the speaker anonymization baseline system of the VPC on emotional information present in speech utterances. Evaluation is performed following the VPC rules regarding the attackers' knowledge about the anonymization system. Our results show that the VPC baseline system does not suppress speakers' emotions against informed attackers. When comparing anonymized speech to original speech, the emotion recognition performance is degraded by 15\% relative to IEMOCAP data, similar to the degradation observed for automatic speech recognition used to evaluate the preservation of the linguistic information.

Related papers

On the Generation and Removal of Speaker Adversarial Perturbation for Voice-Privacy Protection [45.49915832081347]
Recent development in voice-privacy protection has shown the positive use cases of the same technique to conceal speaker's voice attribute. This paper examines the reversibility property where an entity generating adversarial perturbations is authorized to remove them and restore original speech. A similar technique could also be used by an investigator to deanonymize a voice-protected speech to restore criminals' identities in security and forensic analysis.
arXiv Detail & Related papers (2024-12-12T11:46:07Z)
Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding [46.25816642820348]
We focus on altering the voice attributes against machine recognition while retaining human perception. A speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances.
arXiv Detail & Related papers (2024-06-12T13:33:24Z)
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z)
Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy [22.84840887071428]
Speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings. This typically comes with a privacy-utility trade-off between protection of individuals and usability of the data for downstream applications. We propose to tackle this issue by generating speaker embeddings using a generative adversarial network with Wasserstein distance as cost function.
arXiv Detail & Related papers (2022-10-13T13:12:42Z)
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices [15.136348385992047]
We train several voice conversion models using self-supervised speech representations. Converted voices retain a low word error rate within 1% of the original voice. Experiments on dysarthric speech data show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices.
arXiv Detail & Related papers (2022-04-04T17:48:01Z)
Differentially Private Speaker Anonymization [44.90119821614047]
Sharing real-world speech utterances is key to the training and deployment of voice-based services. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. We show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information.
arXiv Detail & Related papers (2022-02-23T23:20:30Z)
An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism. Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes. Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z)
Protecting gender and identity with disentangled speech representations [49.00162808063399]
We show that protecting gender information in speech is more effective than modelling speaker-identity information. We present a novel way to encode gender information and disentangle two sensitive biometric identifiers.
arXiv Detail & Related papers (2021-04-22T13:31:41Z)
High Fidelity Speech Regeneration with Application to Speech Enhancement [96.34618212590301]
We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner. Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source.
arXiv Detail & Related papers (2021-01-31T10:54:27Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.