Certification of Speaker Recognition Models to Additive Perturbations
- URL: http://arxiv.org/abs/2404.18791v1
- Date: Mon, 29 Apr 2024 15:23:26 GMT
- Title: Certification of Speaker Recognition Models to Additive Perturbations
- Authors: Dmitrii Korzh, Elvir Karimov, Mikhail Pautov, Oleg Y. Rogov, Ivan Oseledets,
- Abstract summary: We pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain.
We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models.
We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.
- Score: 4.332441337407564
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Speaker recognition technology is applied in various tasks ranging from personal virtual assistants to secure access systems. However, the robustness of these systems against adversarial attacks, particularly to additive perturbations, remains a significant challenge. In this paper, we pioneer applying robustness certification techniques to speaker recognition, originally developed for the image domain. In our work, we cover this gap by transferring and improving randomized smoothing certification techniques against norm-bounded additive perturbations for classification and few-shot learning tasks to speaker recognition. We demonstrate the effectiveness of these methods on VoxCeleb 1 and 2 datasets for several models. We expect this work to improve voice-biometry robustness, establish a new certification benchmark, and accelerate research of certification methods in the audio domain.
Related papers
- A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech
Enhancement [16.900731393703648]
Self-supervised learned models have been found to be very effective for certain speech tasks.
In this paper, we investigate the uses of SSL representations for single-channel speech enhancement in challenging conditions.
arXiv Detail & Related papers (2024-03-03T02:05:17Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Meta-Learning Framework for End-to-End Imposter Identification in Unseen
Speaker Recognition [4.143603294943441]
We show the problem of generalization using fixed thresholds (computed using EER metric) for imposter identification in unseen speaker recognition.
We then introduce a robust speaker-specific thresholding technique for better performance.
We show the efficacy of the proposed techniques on VoxCeleb1, VCTK and the FFSVC 2022 datasets, beating the baselines by up to 10%.
arXiv Detail & Related papers (2023-06-01T17:49:58Z) - Open-set Adversarial Defense with Clean-Adversarial Mutual Learning [93.25058425356694]
This paper demonstrates that open-set recognition systems are vulnerable to adversarial samples.
Motivated by these observations, we emphasize the necessity of an Open-Set Adversarial Defense (OSAD) mechanism.
This paper proposes an Open-Set Defense Network with Clean-Adversarial Mutual Learning (OSDN-CAML) as a solution to the OSAD problem.
arXiv Detail & Related papers (2022-02-12T02:13:55Z) - A Review of Speaker Diarization: Recent Advances with Deep Learning [78.20151731627958]
Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity.
With the rise of deep learning technology, more rapid advancements have been made for speaker diarization.
We discuss how speaker diarization systems have been integrated with speech recognition applications.
arXiv Detail & Related papers (2021-01-24T01:28:05Z) - Adversarial Attack and Defense Strategies for Deep Speaker Recognition
Systems [44.305353565981015]
This paper considers several state-of-the-art adversarial attacks to a deep speaker recognition system, employing strong defense methods as countermeasures.
Experiments show that the speaker recognition systems are vulnerable to adversarial attacks, and the strongest attacks can reduce the accuracy of the system from 94% to even 0%.
arXiv Detail & Related papers (2020-08-18T00:58:19Z) - SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems [28.635467696564703]
We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space.
We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
arXiv Detail & Related papers (2020-07-13T18:52:25Z) - Segment Aggregation for short utterances speaker verification using raw
waveforms [47.41124427552161]
We propose a method that compensates for the performance degradation of speaker verification for short utterances.
The proposed method adopts an ensemble-based design to improve the stability and accuracy of speaker verification systems.
arXiv Detail & Related papers (2020-05-07T08:57:22Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z) - Robust Speaker Recognition Using Speech Enhancement And Attention Model [37.33388614967888]
Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks.
To increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain.
The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.
arXiv Detail & Related papers (2020-01-14T20:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.