Speaker Anonymization with Distribution-Preserving X-Vector Generation
for the VoicePrivacy Challenge 2020
- URL: http://arxiv.org/abs/2010.13457v2
- Date: Tue, 5 Jan 2021 16:11:35 GMT
- Title: Speaker Anonymization with Distribution-Preserving X-Vector Generation
for the VoicePrivacy Challenge 2020
- Authors: Henry Turner, Giulio Lovisotto and Ivan Martinovic
- Abstract summary: We present a Distribution-Preserving Voice Anonymization technique, as our submission to the VoicePrivacy Challenge 2020.
We show how this approach generates X-vectors that more closely follow the expected intra-similarity distribution of organic speaker X-vectors.
- Score: 19.420608243033794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a Distribution-Preserving Voice Anonymization
technique, as our submission to the VoicePrivacy Challenge 2020. We observe
that the challenge baseline system generates fake X-vectors which are very
similar to each other, significantly more so than those extracted from organic
speakers. This difference arises from averaging many X-vectors from a pool of
speakers in the anonymization process, causing a loss of information. We
propose a new method to generate fake X-vectors which overcomes these
limitations by preserving the distributional properties of X-vectors and their
intra-similarity. We use population data to learn the properties of the
X-vector space, before fitting a generative model which we use to sample fake
X-vectors. We show how this approach generates X-vectors that more closely
follow the expected intra-similarity distribution of organic speaker X-vectors.
Our method can be easily integrated with others as the anonymization component
of the system and removes the need to distribute a pool of speakers to use
during the anonymization. Our approach leads to an increase in EER of up to
$19.4\%$ in males and $11.1\%$ in females in scenarios where enrollment and
trial utterances are anonymized versus the baseline solution, demonstrating the
diversity of our generated voices.
Related papers
- DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification.
DASA generates diversified training samples in speaker embedding space with negligible extra computing cost.
The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z) - Vocoder drift compensation by x-vector alignment in speaker
anonymisation [11.480724899031149]
This paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody.
Also reported is an original approach to vocoder drift compensation.
arXiv Detail & Related papers (2023-07-17T11:38:35Z) - Dior-CVAE: Pre-trained Language Models and Diffusion Priors for
Variational Dialog Generation [70.2283756542824]
Dior-CVAE is a hierarchical conditional variational autoencoder (CVAE) with diffusion priors to address these challenges.
We employ a diffusion model to increase the complexity of the prior distribution and its compatibility with the distributions produced by a PLM.
Experiments across two commonly used open-domain dialog datasets show that our method can generate more diverse responses without large-scale dialog pre-training.
arXiv Detail & Related papers (2023-05-24T11:06:52Z) - Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
Source Separation [99.19786288094596]
We show how the upper bound can be generalized to the case of random generative models.
We show state-of-the-art results on 2, 3, 5, 10, and 20 speakers on multiple benchmarks.
arXiv Detail & Related papers (2023-01-25T18:21:51Z) - Symmetric Saliency-based Adversarial Attack To Speaker Identification [17.087523686496958]
We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED)
First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system.
Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
arXiv Detail & Related papers (2022-10-30T08:54:02Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - On the invertibility of a voice privacy system using embedding
alignement [0.0]
This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques.
We compute the optimal rotation and compare the results of this approximation to the official Voice Privacy Challenge results.
arXiv Detail & Related papers (2021-10-08T14:43:47Z) - Integrating end-to-end neural and clustering-based diarization: Getting
the best of both worlds [71.36164750147827]
Clustering-based approaches assign speaker labels to speech regions by clustering speaker embeddings such as x-vectors.
End-to-end neural diarization (EEND) directly predicts diarization labels using a neural network.
We propose a simple but effective hybrid diarization framework that works with overlapped speech and for long recordings containing an arbitrary number of speakers.
arXiv Detail & Related papers (2020-10-26T06:33:02Z) - Design Choices for X-vector Based Speaker Anonymization [48.46018902334472]
We present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge.
Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
arXiv Detail & Related papers (2020-05-18T11:32:14Z) - Target-Speaker Voice Activity Detection: a Novel Approach for
Multi-Speaker Diarization in a Dinner Party Scenario [51.50631198081903]
We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach.
TS-VAD directly predicts an activity of each speaker on each time frame.
Experiments on the CHiME-6 unsegmented data show that TS-VAD achieves state-of-the-art results.
arXiv Detail & Related papers (2020-05-14T21:24:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.