Evaluating Identity Leakage in Speaker De-Identification Systems
- URL: http://arxiv.org/abs/2508.14012v1
- Date: Tue, 19 Aug 2025 17:20:25 GMT
- Title: Evaluating Identity Leakage in Speaker De-Identification Systems
- Authors: Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold,
- Abstract summary: Speaker de-identification aims to conceal a speaker's identity while preserving intelligibility of the underlying speech.<n>We introduce a benchmark that quantifies residual identity leakage with three complementary error rates.<n> Evaluation results reveal that all state-of-the-art speaker de-identification systems leak identity information.
- Score: 1.7699344561127388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speaker de-identification aims to conceal a speaker's identity while preserving intelligibility of the underlying speech. We introduce a benchmark that quantifies residual identity leakage with three complementary error rates: equal error rate, cumulative match characteristic hit rate, and embedding-space similarity measured via canonical correlation analysis and Procrustes analysis. Evaluation results reveal that all state-of-the-art speaker de-identification systems leak identity information. The highest performing system in our evaluation performs only slightly better than random guessing, while the lowest performing system achieves a 45% hit rate within the top 50 candidates based on CMC. These findings highlight persistent privacy risks in current speaker de-identification technologies.
Related papers
- Assessing the Impact of Speaker Identity in Speech Spoofing Detection [1.7816843507516946]
Spoofing detection systems are typically trained using diverse recordings from multiple speakers.<n>In this paper, we investigate the impact of speaker information on spoofing detection systems.<n>We propose two approaches within our Speaker-Invariant Multi-Task framework.
arXiv Detail & Related papers (2026-02-24T11:45:41Z) - VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks [51.68795949691009]
We introduce VoxGuard, a framework grounded in differential privacy and membership inference.<n>For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization.<n>Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation.
arXiv Detail & Related papers (2025-09-22T20:57:48Z) - SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions [54.34001921326444]
Speaker verification (SV) models are increasingly integrated into security, personalization, and access control systems.<n>Existing benchmarks evaluate only subsets of these conditions, missing others entirely.<n>We introduce SVeritas, a comprehensive Speaker Verification tasks benchmark suite, assessing SV systems under stressors like recording duration, spontaneity, content, noise, microphone distance, reverberation, channel mismatches, audio bandwidth, codecs, speaker age, and susceptibility to spoofing and adversarial attacks.
arXiv Detail & Related papers (2025-09-21T14:11:16Z) - Towards Low-Latency Tracking of Multiple Speakers With Short-Context Speaker Embeddings [52.985061676464554]
We propose a Knowledge Distillation based training approach for short context speaker embedding extraction.<n>We leverage the spatial information of the speaker of interest using beamforming to reduce overlap.<n>Results demonstrate that our models are effective at short-context embedding extraction and more robust to overlap.
arXiv Detail & Related papers (2025-08-18T11:32:13Z) - AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation [55.607230723223346]
This work presents a systematic study of Large Audio Model (LAM) as a Judge, AudioJudge, investigating whether it can provide a unified evaluation framework that addresses both challenges.<n>We explore AudioJudge across audio characteristic detection tasks, including pronunciation, speaking rate, speaker identification and speech quality, and system-level human preference simulation for automated benchmarking.<n>We introduce a multi-aspect ensemble AudioJudge to enable general-purpose multi-aspect audio evaluation. This method decomposes speech assessment into specialized judges for lexical content, speech quality, and paralinguistic features, achieving up to 0.91 Spearman correlation with human preferences on
arXiv Detail & Related papers (2025-07-17T00:39:18Z) - Investigating Confidence Estimation Measures for Speaker Diarization [4.679826697518427]
Speaker diarization systems segment a conversation recording based on the speakers' identity.
Speaker diarization errors propagate to, and can adversely affect, downstream systems that rely on the speaker's identity.
One way to mitigate these errors is to provide segment-level diarization confidence scores to downstream systems.
arXiv Detail & Related papers (2024-06-24T20:21:38Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Meta-Learning Framework for End-to-End Imposter Identification in Unseen
Speaker Recognition [4.143603294943441]
We show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition.
We then introduce a robust speaker-specific thresholding technique for better performance.
We show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%.
arXiv Detail & Related papers (2023-06-01T17:49:58Z) - Privacy-Utility Balanced Voice De-Identification Using Adversarial
Examples [32.3274243128532]
We propose a voice de-identification system to balance the privacy and utility of voice services.
Our system could achieve 98% and 79% successful de-identification on mainstream ASIs and commercial systems.
arXiv Detail & Related papers (2022-11-10T09:35:58Z) - Symmetric Saliency-based Adversarial Attack To Speaker Identification [17.087523686496958]
We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED)
First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system.
Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
arXiv Detail & Related papers (2022-10-30T08:54:02Z) - Text Independent Speaker Identification System for Access Control [0.0]
Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual.
This paper presents a text-independent speaker identification system that employs Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor (kNN) for classification.
arXiv Detail & Related papers (2022-09-26T14:42:18Z) - Conformer Based Elderly Speech Recognition System for Alzheimer's
Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression.
This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.