Vocoder drift compensation by x-vector alignment in speaker
anonymisation
- URL: http://arxiv.org/abs/2307.08403v1
- Date: Mon, 17 Jul 2023 11:38:35 GMT
- Title: Vocoder drift compensation by x-vector alignment in speaker
anonymisation
- Authors: Michele Panariello, Massimiliano Todisco, Nicholas Evans
- Abstract summary: This paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody.
Also reported is an original approach to vocoder drift compensation.
- Score: 11.480724899031149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For the most popular x-vector-based approaches to speaker anonymisation, the
bulk of the anonymisation can stem from vocoding rather than from the core
anonymisation function which is used to substitute an original speaker x-vector
with that of a fictitious pseudo-speaker. This phenomenon can impede the design
of better anonymisation systems since there is a lack of fine-grained control
over the x-vector space. The work reported in this paper explores the origin of
so-called vocoder drift and shows that it is due to the mismatch between the
substituted x-vector and the original representations of the linguistic
content, intonation and prosody. Also reported is an original approach to
vocoder drift compensation. While anonymisation performance degrades as
expected, compensation reduces vocoder drift substantially, offers improved
control over the x-vector space and lays a foundation for the design of better
anonymisation functions in the future.
Related papers
- Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures.
We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem.
SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z) - Look-back Decoding for Open-Ended Text Generation [62.53302138266465]
We propose Look-back, an improved decoding algorithm that tracks the distribution distance between current and historical decoding steps.
Look-back can automatically predict potential repetitive phrase and topic drift, and remove tokens that may cause the failure modes.
We perform decoding experiments on document continuation and story generation, and demonstrate that Look-back is able to generate more fluent and coherent text.
arXiv Detail & Related papers (2023-05-22T20:42:37Z) - AudioSlots: A slot-centric generative model for audio separation [26.51135156983783]
We present AudioSlots, a slot-centric generative model for blind source separation in the audio domain.
We train the model in an end-to-end manner using a permutation-equivariant loss function.
Our results on Libri2Mix speech separation constitute a proof of concept that this approach shows promise.
arXiv Detail & Related papers (2023-05-09T16:28:07Z) - Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques.
Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders.
We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z) - Are disentangled representations all you need to build speaker
anonymization systems? [0.0]
Speech signals contain a lot of sensitive information, such as the speaker's identity.
Speaker anonymization aims to transform a speech signal to remove the source speaker's identity while leaving the spoken content unchanged.
arXiv Detail & Related papers (2022-08-22T07:51:47Z) - On the invertibility of a voice privacy system using embedding
alignement [0.0]
This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques.
We compute the optimal rotation and compare the results of this approximation to the official Voice Privacy Challenge results.
arXiv Detail & Related papers (2021-10-08T14:43:47Z) - Speaker Anonymization with Distribution-Preserving X-Vector Generation
for the VoicePrivacy Challenge 2020 [19.420608243033794]
We present a Distribution-Preserving Voice Anonymization technique, as our submission to the VoicePrivacy Challenge 2020.
We show how this approach generates X-vectors that more closely follow the expected intra-similarity distribution of organic speaker X-vectors.
arXiv Detail & Related papers (2020-10-26T09:53:56Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Design Choices for X-vector Based Speaker Anonymization [48.46018902334472]
We present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge.
Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
arXiv Detail & Related papers (2020-05-18T11:32:14Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.