QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
Systems
- URL: http://arxiv.org/abs/2305.14097v2
- Date: Sat, 23 Sep 2023 15:19:46 GMT
- Title: QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
Systems
- Authors: Guangke Chen, Yedi Zhang, Zhe Zhao, Fu Song
- Abstract summary: Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS.
We propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices.
QFA2SR is highly effective when launched over the air against three wide-spread voice assistants with 60%, 46%, and 70% targeted transferability, respectively.
- Score: 7.924452626448202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current adversarial attacks against speaker recognition systems (SRSs)
require either white-box access or heavy black-box queries to the target SRS,
thus still falling behind practical attacks against proprietary commercial APIs
and voice-controlled devices. To fill this gap, we propose QFA2SR, an effective
and imperceptible query-free black-box attack, by leveraging the
transferability of adversarial voices. To improve transferability, we present
three novel methods, tailored loss functions, SRS ensemble, and time-freq
corrosion. The first one tailors loss functions to different attack scenarios.
The latter two augment surrogate SRSs in two different ways. SRS ensemble
combines diverse surrogate SRSs with new strategies, amenable to the unique
scoring characteristics of SRSs. Time-freq corrosion augments surrogate SRSs by
incorporating well-designed time-/frequency-domain modification functions,
which simulate and approximate the decision boundary of the target SRS and
distortions introduced during over-the-air attacks. QFA2SR boosts the targeted
transferability by 20.9%-70.7% on four popular commercial APIs (Microsoft
Azure, iFlytek, Jingdong, and TalentedSoft), significantly outperforming
existing attacks in query-free setting, with negligible effect on the
imperceptibility. QFA2SR is also highly effective when launched over the air
against three wide-spread voice assistants (Google Assistant, Apple Siri, and
TMall Genie) with 60%, 46%, and 70% targeted transferability, respectively.
Related papers
- Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems [27.281231584238824]
Black-box adversarial example attacks pose significant threats to real-world ASR systems.
We propose ZQ-Attack, a transfer-based adversarial attack on ASR systems.
In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB.
arXiv Detail & Related papers (2024-06-27T16:39:36Z) - Query Provenance Analysis for Robust and Efficient Query-based Black-box Attack Defense [11.32992178606254]
We propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient Stateful Defense Models (SDMs)
QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences.
We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms.
arXiv Detail & Related papers (2024-05-31T06:56:54Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Towards Understanding and Mitigating Audio Adversarial Examples for
Speaker Recognition [13.163192823774624]
Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns.
We present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks on speaker recognition.
We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting.
arXiv Detail & Related papers (2022-06-07T15:38:27Z) - AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
Recognition Systems [15.013763364096638]
Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks.
We present AS2T, the first attack in this domain which covers all the settings.
We study the possible distortions occurred in over-the-air transmission, utilize different transformation functions with different parameters to model those distortions, and incorporate them into the generation of adversarial voices.
arXiv Detail & Related papers (2022-06-07T14:38:55Z) - Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Sparse-RS: a versatile framework for query-efficient sparse black-box
adversarial attacks [64.03012884804458]
We propose a versatile framework based on random search, Sparse-RS, for sparse targeted and untargeted attacks in the black-box setting.
Sparse-RS does not rely on substitute models and achieves state-of-the-art success rate and query efficiency for multiple sparse attack models.
arXiv Detail & Related papers (2020-06-23T08:50:37Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.