QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
Systems
- URL: http://arxiv.org/abs/2305.14097v2
- Date: Sat, 23 Sep 2023 15:19:46 GMT
- Title: QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
Systems
- Authors: Guangke Chen, Yedi Zhang, Zhe Zhao, Fu Song
- Abstract summary: Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS.
We propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices.
QFA2SR is highly effective when launched over the air against three wide-spread voice assistants with 60%, 46%, and 70% targeted transferability, respectively.
- Score: 7.924452626448202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current adversarial attacks against speaker recognition systems (SRSs)
require either white-box access or heavy black-box queries to the target SRS,
thus still falling behind practical attacks against proprietary commercial APIs
and voice-controlled devices. To fill this gap, we propose QFA2SR, an effective
and imperceptible query-free black-box attack, by leveraging the
transferability of adversarial voices. To improve transferability, we present
three novel methods, tailored loss functions, SRS ensemble, and time-freq
corrosion. The first one tailors loss functions to different attack scenarios.
The latter two augment surrogate SRSs in two different ways. SRS ensemble
combines diverse surrogate SRSs with new strategies, amenable to the unique
scoring characteristics of SRSs. Time-freq corrosion augments surrogate SRSs by
incorporating well-designed time-/frequency-domain modification functions,
which simulate and approximate the decision boundary of the target SRS and
distortions introduced during over-the-air attacks. QFA2SR boosts the targeted
transferability by 20.9%-70.7% on four popular commercial APIs (Microsoft
Azure, iFlytek, Jingdong, and TalentedSoft), significantly outperforming
existing attacks in query-free setting, with negligible effect on the
imperceptibility. QFA2SR is also highly effective when launched over the air
against three wide-spread voice assistants (Google Assistant, Apple Siri, and
TMall Genie) with 60%, 46%, and 70% targeted transferability, respectively.
Related papers
- Toward Improving Synthetic Audio Spoofing Detection Robustness via Meta-Learning and Disentangled Training With Adversarial Examples [33.445126880876415]
We propose a reliable and robust spoofing detection system to filter out spoofing attacks instead of having them reach the automatic speaker verification system.
A weighted additive angular margin loss is proposed to address the data imbalance issue, and different margins has been assigned to improve generalization to unseen spoofing attacks.
We craft adversarial examples by adding imperceptible perturbations to spoofing speech as a data augmentation strategy, then we use an auxiliary batch normalization to guarantee that corresponding normalization statistics are performed exclusively on the adversarial examples.
arXiv Detail & Related papers (2024-08-23T19:26:54Z) - ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features [25.28307679567351]
ALIF is the first black-box adversarial linguistic feature-based attack pipeline.
We present ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment.
arXiv Detail & Related papers (2024-08-03T15:30:16Z) - Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems [27.281231584238824]
Black-box adversarial example attacks pose significant threats to real-world ASR systems.
We propose ZQ-Attack, a transfer-based adversarial attack on ASR systems.
In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB.
arXiv Detail & Related papers (2024-06-27T16:39:36Z) - Query Provenance Analysis: Efficient and Robust Defense against Query-based Black-box Attacks [11.32992178606254]
We propose a novel approach, Query Provenance Analysis (QPA), for more robust and efficient Stateful Defense Models (SDMs)
QPA encapsulates the historical relationships among queries as the sequence feature to capture the fundamental difference between benign and adversarial query sequences.
We evaluate QPA compared with two baselines, BlackLight and PIHA, on four widely used datasets with six query-based black-box attack algorithms.
arXiv Detail & Related papers (2024-05-31T06:56:54Z) - Improved Generation of Adversarial Examples Against Safety-aligned LLMs [72.38072942860309]
Adversarial prompts generated using gradient-based methods exhibit outstanding performance in performing automatic jailbreak attacks against safety-aligned LLMs.
In this paper, we explore a new perspective on this problem, suggesting that it can be alleviated by leveraging innovations inspired in transfer-based attacks.
We show that 87% of the query-specific adversarial suffixes generated by the developed combination can induce Llama-2-7B-Chat to produce the output that exactly matches the target string on AdvBench.
arXiv Detail & Related papers (2024-05-28T06:10:12Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
Recognition Systems [15.013763364096638]
Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks.
We present AS2T, the first attack in this domain which covers all the settings.
We study the possible distortions occurred in over-the-air transmission, utilize different transformation functions with different parameters to model those distortions, and incorporate them into the generation of adversarial voices.
arXiv Detail & Related papers (2022-06-07T14:38:55Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Sparse-RS: a versatile framework for query-efficient sparse black-box
adversarial attacks [64.03012884804458]
We propose a versatile framework based on random search, Sparse-RS, for sparse targeted and untargeted attacks in the black-box setting.
Sparse-RS does not rely on substitute models and achieves state-of-the-art success rate and query efficiency for multiple sparse attack models.
arXiv Detail & Related papers (2020-06-23T08:50:37Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.