Related papers: Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples

Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples

URL: http://arxiv.org/abs/2211.05446v1
Date: Thu, 10 Nov 2022 09:35:58 GMT
Title: Privacy-Utility Balanced Voice De-Identification Using Adversarial Examples
Authors: Meng Chen, Li Lu, Jiadi Yu, Yingying Chen, Zhongjie Ba, Feng Lin, Kui Ren
Abstract summary: We propose a voice de-identification system to balance the privacy and utility of voice services. Our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems.
Score: 32.3274243128532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying convenient voice services. Existing studies employ direct modification or text-based re-synthesis to de-identify users' voices, but resulting in inconsistent audibility in the presence of human participants. In this paper, we propose a voice de-identification system, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefit from this, our system could preserve user identity from exposure by Automatic Speaker Identification (ASI) while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, our system learns a compact speaker distribution through a conditional variational auto-encoder to sample diverse target embeddings on demand. Combining diverse target generation and input-specific perturbation construction, our system enables any-to-any identify transformation for adaptive de-identification. Experimental results show that our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems with an objective Mel cepstral distortion of 4.31dB and a subjective mean opinion score of 4.48.

Related papers

VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks [51.68795949691009]
We introduce VoxGuard, a framework grounded in differential privacy and membership inference.<n>For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization.<n>Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation.
arXiv Detail & Related papers (2025-09-22T20:57:48Z)
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack [10.019452425301303]
Adversarial perturbations in speech pose a serious threat to automatic speech recognition (ASR) and speaker verification.<n>We analyze adversarial audio at the phonetic level and show that perturbations exploit systematic confusions such as vowel centralization and consonant substitutions.<n>Results across 16 phonetically diverse target phrases demonstrate that adversarial audio induces both transcription errors and identity drift.
arXiv Detail & Related papers (2025-09-18T21:19:53Z)
Evaluating Identity Leakage in Speaker De-Identification Systems [1.7699344561127388]
Speaker de-identification aims to conceal a speaker's identity while preserving intelligibility of the underlying speech.<n>We introduce a benchmark that quantifies residual identity leakage with three complementary error rates.<n> Evaluation results reveal that all state-of-the-art speaker de-identification systems leak identity information.
arXiv Detail & Related papers (2025-08-19T17:20:25Z)
Analyzing and Improving Speaker Similarity Assessment for Speech Synthesis [20.80178325643714]
In generative speech systems, identity is often assessed using automatic speaker verification (ASV) embeddings.<n>We find that widely used ASV embeddings focus mainly on static features like timbre and pitch range, while neglecting dynamic elements such as rhythm.<n>To address these gaps, we propose U3D, a metric that evaluates speakers' dynamic rhythm patterns.
arXiv Detail & Related papers (2025-07-02T22:16:42Z)
CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition [49.27067541740956]
We present CO-VADA, a Confidence-Oriented Voice Augmentation Debiasing Approach that mitigates bias without modifying model architecture or relying on demographic information.<n>CO-VADA identifies training samples that reflect bias patterns present in the training data and then applies voice conversion to alter irrelevant attributes and generate samples.<n>Our framework is compatible with various SER models and voice conversion tools, making it a scalable and practical solution for improving fairness in SER systems.
arXiv Detail & Related papers (2025-06-06T13:25:56Z)
RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z)
Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques [1.2691047660244337]
The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization.
arXiv Detail & Related papers (2023-08-05T16:14:17Z)
Anomalous Sound Detection using Audio Representation with Machine ID based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample. The proposed two-stage method uses contrastive learning to pretrain the audio representation model. Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z)
Symmetric Saliency-based Adversarial Attack To Speaker Identification [17.087523686496958]
We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED) First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system. Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
arXiv Detail & Related papers (2022-10-30T08:54:02Z)
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion [0.0]
DeID-VC is a speaker de-identification system that converts a real speaker to pseudo speakers. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level.
arXiv Detail & Related papers (2022-09-09T21:13:08Z)
An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism. Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes. Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z)
Personalized Keyphrase Detection using Speaker and Environment Information [24.766475943042202]
We introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary. The system is implemented with an end-to-end trained automatic speech recognition (ASR) model and a text-independent speaker verification model.
arXiv Detail & Related papers (2021-04-28T18:50:19Z)
Voice Privacy with Smart Digital Assistants in Educational Settings [1.8369974607582578]
We design and evaluate a practical and efficient framework for voice privacy at the source. The approach combines speaker identification (SID) and speech conversion methods to randomly disguise the identity of users right on the device that records the speech. We evaluate the ASR performance of the conversion in terms of word error rate and show the promise of this framework in preserving the content of the input speech.
arXiv Detail & Related papers (2021-03-24T19:58:45Z)
Speaker De-identification System using Autoencoders and Adversarial Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders. Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z)
Design Choices for X-vector Based Speaker Anonymization [48.46018902334472]
We present a flexible pseudo-speaker selection technique as a baseline for the first VoicePrivacy Challenge. Experiments are performed using datasets derived from LibriSpeech to find the optimal combination of design choices in terms of privacy and utility.
arXiv Detail & Related papers (2020-05-18T11:32:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.