Related papers: Design Choices for X-vector Based Speaker Anonymization

Design Choices for X-vector Based Speaker Anonymization

URL: http://arxiv.org/abs/2005.08601v1
Date: Mon, 18 May 2020 11:32:14 GMT
Title: Design Choices for X-vector Based Speaker Anonymization
Authors: Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aur\'elien Bellet, Marc Tommasi
Abstract summary: We present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
Score: 48.46018902334472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker. In this paper, we present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. We explore several design choices for the distance metric between speakers, the region of x-vector space where the pseudo-speaker is picked, and gender selection. To assess the strength of anonymization achieved, we consider attackers using an x-vector based speaker verification system who may use original or anonymized speech for enrollment, depending on their knowledge of the anonymization scheme. The Equal Error Rate (EER) achieved by the attackers and the decoding Word Error Rate (WER) over anonymized data are reported as the measures of privacy and utility. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.

Related papers

Speaker Embeddings to Improve Tracking of Intermittent and Moving Speakers [53.12031345322412]
We propose to perform identity reassignment post-tracking, using speaker embeddings.<n>Beamforming is used to enhance the signal towards the speakers' positions in order to compute speaker embeddings.<n>We evaluate the performance of the proposed speaker embedding-based identity reassignment method on a dataset where speakers change position during inactivity periods.
arXiv Detail & Related papers (2025-06-23T13:02:20Z)
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives. We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources. We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z)
A Benchmark for Multi-speaker Anonymization [9.990701310620368]
We present an attempt to provide a multi-speaker anonymization benchmark for real-world applications. A cascaded system uses speaker diarization to aggregate the speech of each speaker and speaker anonymization to conceal speaker privacy and preserve speech content. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system.
arXiv Detail & Related papers (2024-07-08T04:48:43Z)
Provably Secure Disambiguating Neural Linguistic Steganography [66.30965740387047]
The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods.
arXiv Detail & Related papers (2024-03-26T09:25:57Z)
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z)
Vocoder drift compensation by x-vector alignment in speaker anonymisation [11.480724899031149]
This paper explores the origin of so-called vocoder drift and shows that it is due to the mismatch between the substituted x-vector and the original representations of the linguistic content, intonation and prosody. Also reported is an original approach to vocoder drift compensation.
arXiv Detail & Related papers (2023-07-17T11:38:35Z)
Speaker Embedding-aware Neural Diarization: a Novel Framework for Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem. We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
Privacy-Preserving Speech Representation Learning using Vector Quantization [0.0]
Speech signals contain a lot of sensitive information, such as the speaker's identity, which raises privacy concerns. This paper aims to produce an anonymous representation while preserving speech recognition performance.
arXiv Detail & Related papers (2022-03-15T14:01:11Z)
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings [66.50782702086575]
We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. The proposed system is designed to handle meetings with unknown numbers of speakers, using variable-number permutation-invariant cross-entropy based loss functions.
arXiv Detail & Related papers (2021-05-05T14:55:29Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020 [19.420608243033794]
We present a Distribution-Preserving Voice Anonymization technique, as our submission to the VoicePrivacy Challenge 2020. We show how this approach generates X-vectors that more closely follow the expected intra-similarity distribution of organic speaker X-vectors.
arXiv Detail & Related papers (2020-10-26T09:53:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.