Real-time, Universal, and Robust Adversarial Attacks Against Speaker
Recognition Systems
- URL: http://arxiv.org/abs/2003.02301v2
- Date: Fri, 1 May 2020 02:33:22 GMT
- Title: Real-time, Universal, and Robust Adversarial Attacks Against Speaker
Recognition Systems
- Authors: Yi Xie, Cong Shi, Zhuohang Li, Jian Liu, Yingying Chen, Bo Yuan
- Abstract summary: We propose the first real-time, universal, and robust adversarial attack against the state-of-the-art deep neural network (DNN) based speaker recognition system.
Experiment using a public dataset of 109 English speakers demonstrates the effectiveness and robustness of our proposed attack with a high attack success rate of over 90%.
- Score: 21.559732692440424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the popularity of voice user interface (VUI) exploded in recent years,
speaker recognition system has emerged as an important medium of identifying a
speaker in many security-required applications and services. In this paper, we
propose the first real-time, universal, and robust adversarial attack against
the state-of-the-art deep neural network (DNN) based speaker recognition
system. Through adding an audio-agnostic universal perturbation on arbitrary
enrolled speaker's voice input, the DNN-based speaker recognition system would
identify the speaker as any target (i.e., adversary-desired) speaker label. In
addition, we improve the robustness of our attack by modeling the sound
distortions caused by the physical over-the-air propagation through estimating
room impulse response (RIR). Experiment using a public dataset of 109 English
speakers demonstrates the effectiveness and robustness of our proposed attack
with a high attack success rate of over 90%. The attack launching time also
achieves a 100X speedup over contemporary non-universal attacks.
Related papers
- Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - Real-time Speaker counting in a cocktail party scenario using
Attention-guided Convolutional Neural Network [60.99112031408449]
We propose a real-time, single-channel attention-guided Convolutional Neural Network (CNN) to estimate the number of active speakers in overlapping speech.
The proposed system extracts higher-level information from the speech spectral content using a CNN model.
Experiments on simulated overlapping speech using WSJ corpus show that the attention solution is shown to improve the performance by almost 3% absolute over conventional temporal average pooling.
arXiv Detail & Related papers (2021-10-30T19:24:57Z) - Perceptual-based deep-learning denoiser as a defense against adversarial
attacks on ASR systems [26.519207339530478]
Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal.
We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline.
We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness.
arXiv Detail & Related papers (2021-07-12T07:00:06Z) - Attack on practical speaker verification system using universal
adversarial perturbations [20.38185341318529]
This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker.
A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition.
arXiv Detail & Related papers (2021-05-19T09:43:34Z) - Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio.
We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems [28.635467696564703]
We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space.
We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
arXiv Detail & Related papers (2020-07-13T18:52:25Z) - Enabling Fast and Universal Audio Adversarial Attack Using Generative
Model [21.559732692440424]
We propose fast audio adversarial perturbation generator (FAPG)
FAPG uses generative model to generate adversarial perturbations for the audio input in a single forward pass.
We also propose universal audio adversarial perturbation generator (UAPG)
arXiv Detail & Related papers (2020-04-26T00:51:54Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.