Related papers: You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

URL: http://arxiv.org/abs/2506.09521v1
Date: Wed, 11 Jun 2025 08:46:18 GMT
Title: You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks
Authors: Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters,
Abstract summary: We assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets.<n>On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%.<n>Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.
Score: 9.235490630909323
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.

Related papers

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction [87.49303116989708]
We explore the potential of pre-trained speech-language models (PSLMs) and pre-trained language models (PLMs) as auxiliary knowledge sources for AV-TSE.<n>In this study, we propose incorporating the linguistic constraints from PSLMs or PLMs for the AV-TSE model as additional supervision signals.<n>Without any extra computational cost during inference, the proposed approach consistently improves speech quality and intelligibility.
arXiv Detail & Related papers (2025-06-11T14:36:26Z)
Probing the Feasibility of Multilingual Speaker Anonymization [28.445925953669825]
We extend a state-of-the-art anonymization system to nine languages. Experiments testing the robustness of the anonymized speech against privacy attacks and speech deterioration show an overall success of this system for all languages.
arXiv Detail & Related papers (2024-07-03T09:12:53Z)
An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system. We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z)
Evaluation of Speaker Anonymization on Emotional Speech [9.223908421919733]
Speech data carries a range of personal information, such as the speaker's identity and emotional state. Current studies have addressed the topic of preserving speech privacy. The VoicePrivacy 2020 Challenge (VPC) is about speaker anonymization.
arXiv Detail & Related papers (2023-04-15T20:50:29Z)
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition? [86.53044183309824]
We study which factor leads to the success of self-supervised learning on speaker-related tasks. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size.
arXiv Detail & Related papers (2022-04-27T08:35:57Z)
The VoicePrivacy 2022 Challenge Evaluation Plan [46.807999940446294]
Training, development and evaluation datasets are provided. Participants apply their developed anonymization systems. Results will be presented at a workshop held in conjunction with INTERSPEECH 2022.
arXiv Detail & Related papers (2022-03-23T15:05:18Z)
Differentially Private Speaker Anonymization [44.90119821614047]
Sharing real-world speech utterances is key to the training and deployment of voice-based services. Speaker anonymization aims to remove speaker information from a speech utterance while leaving its linguistic and prosodic attributes intact. We show that disentanglement is indeed not perfect: linguistic and prosodic attributes still contain speaker information.
arXiv Detail & Related papers (2022-02-23T23:20:30Z)
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data. We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.