Speaker De-identification System using Autoencoders and Adversarial
Training
- URL: http://arxiv.org/abs/2011.04696v1
- Date: Mon, 9 Nov 2020 19:22:05 GMT
- Title: Speaker De-identification System using Autoencoders and Adversarial
Training
- Authors: Fernando M. Espinoza-Cuadros, Juan M. Perero-Codosero, Javier
Ant\'on-Mart\'in, Luis A. Hern\'andez-G\'omez
- Abstract summary: We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
- Score: 58.720142291102135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fast increase of web services and mobile apps, which collect personal
data from users, increases the risk that their privacy may be severely
compromised. In particular, the increasing variety of spoken language
interfaces and voice assistants empowered by the vertiginous breakthroughs in
Deep Learning are prompting important concerns in the European Union to
preserve speech data privacy. For instance, an attacker can record speech from
users and impersonate them to get access to systems requiring voice
identification. Hacking speaker profiles from users is also possible by means
of existing technology to extract speaker, linguistic (e.g., dialect) and
paralinguistic features (e.g., age) from the speech signal. In order to
mitigate these weaknesses, in this paper, we propose a speaker
de-identification system based on adversarial training and autoencoders in
order to suppress speaker, gender, and accent information from speech.
Experimental results show that combining adversarial learning and autoencoders
increase the equal error rate of a speaker verification system while preserving
the intelligibility of the anonymized spoken content.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - Anonymizing Speech: Evaluating and Designing Speaker Anonymization
Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data.
This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z) - DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion [0.0]
DeID-VC is a speaker de-identification system that converts a real speaker to pseudo speakers.
With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level.
arXiv Detail & Related papers (2022-09-09T21:13:08Z) - Differentially Private Speaker Anonymization [44.90119821614047]
Sharing real-world speech utterances is key to the training and deployment of voice-based services.
Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact.
We show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information.
arXiv Detail & Related papers (2022-02-23T23:20:30Z) - Protecting gender and identity with disentangled speech representations [49.00162808063399]
We show that protecting gender information in speech is more effective than modelling speaker-identity information.
We present a novel way to encode gender information and disentangle two sensitive biometric identifiers.
arXiv Detail & Related papers (2021-04-22T13:31:41Z) - Voice Privacy with Smart Digital Assistants in Educational Settings [1.8369974607582578]
We design and evaluate a practical and efficient framework for voice privacy at the source.
The approach combines speaker identification (SID) and speech conversion methods to randomly disguise the identity of users right on the device that records the speech.
We evaluate the ASR performance of the conversion in terms of word error rate and show the promise of this framework in preserving the content of the input speech.
arXiv Detail & Related papers (2021-03-24T19:58:45Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z) - Adversarial Disentanglement of Speaker Representation for
Attribute-Driven Privacy Preservation [17.344080729609026]
We introduce the concept of attribute-driven privacy preservation in speaker voice representation.
It allows a person to hide one or more personal aspects to a potential malicious interceptor and to the application provider.
We propose an adversarial autoencoding method that disentangles in the voice representation a given speaker attribute thus allowing its concealment.
arXiv Detail & Related papers (2020-12-08T14:47:23Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.