Protecting gender and identity with disentangled speech representations
- URL: http://arxiv.org/abs/2104.11051v1
- Date: Thu, 22 Apr 2021 13:31:41 GMT
- Title: Protecting gender and identity with disentangled speech representations
- Authors: Dimitrios Stoidis and Andrea Cavallaro
- Abstract summary: We show that protecting gender information in speech is more effective than modelling speaker-identity information.
We present a novel way to encode gender information and disentangle two sensitive biometric identifiers.
- Score: 49.00162808063399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Besides its linguistic content, our speech is rich in biometric information
that can be inferred by classifiers. Learning privacy-preserving
representations for speech signals enables downstream tasks without sharing
unnecessary, private information about an individual. In this paper, we show
that protecting gender information in speech is more effective than modelling
speaker-identity information only when generating a non-sensitive
representation of speech. Our method relies on reconstructing speech by
decoding linguistic content along with gender information using a variational
autoencoder. Specifically, we exploit disentangled representation learning to
encode information about different attributes into separate subspaces that can
be factorised independently. We present a novel way to encode gender
information and disentangle two sensitive biometric identifiers, namely gender
and identity, in a privacy-protecting setting. Experiments on the LibriSpeech
dataset show that gender recognition and speaker verification can be reduced to
a random guess, protecting against classification-based attacks, while
maintaining the utility of the signal for speech recognition.
Related papers
- SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation.
Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities.
We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z) - Generating gender-ambiguous voices for privacy-preserving speech
recognition [38.733077459065704]
We present a generative adversarial network, GenGAN, that synthesises voices that conceal the gender or identity of a speaker.
We condition the generator only on gender information and use an adversarial loss between signal distortion and privacy preservation.
arXiv Detail & Related papers (2022-07-03T14:23:02Z) - Improving speaker de-identification with functional data analysis of f0
trajectories [10.809893662563926]
Formant modification is a simpler, yet effective method for speaker de-identification which requires no training data.
This study introduces a novel speaker de-identification method, which, in addition to simple formant shifts, manipulates f0 trajectories based on functional data analysis.
The proposed speaker de-identification method will conceal plausibly identifying pitch characteristics in a phonetically controllable manner and improve formant-based speaker de-identification up to 25%.
arXiv Detail & Related papers (2022-03-31T01:34:15Z) - Privacy-Preserving Speech Representation Learning using Vector
Quantization [0.0]
Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns.
This paper aims to produce an anonymous representation while preserving speech recognition performance.
arXiv Detail & Related papers (2022-03-15T14:01:11Z) - Differentially Private Speaker Anonymization [44.90119821614047]
Sharing real-world speech utterances is key to the training and deployment of voice-based services.
Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact.
We show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information.
arXiv Detail & Related papers (2022-02-23T23:20:30Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Adversarial Disentanglement of Speaker Representation for
Attribute-Driven Privacy Preservation [17.344080729609026]
We introduce the concept of attribute-driven privacy preservation in speaker voice representation.
It allows a person to hide one or more personal aspects to a potential malicious interceptor and to the application provider.
We propose an adversarial autoencoding method that disentangles in the voice representation a given speaker attribute thus allowing its concealment.
arXiv Detail & Related papers (2020-12-08T14:47:23Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - Disentangled Speech Embeddings using Cross-modal Self-supervision [119.94362407747437]
We develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in video.
We construct a two-stream architecture which: (1) shares low-level features common to both representations; and (2) provides a natural mechanism for explicitly disentangling these factors.
arXiv Detail & Related papers (2020-02-20T14:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.