Related papers: Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

URL: http://arxiv.org/abs/2406.19311v1
Date: Thu, 27 Jun 2024 16:39:36 GMT
Title: Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems
Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang,
Abstract summary: Black-box adversarial example attacks pose significant threats to real-world ASR systems. We propose ZQ-Attack, a transfer-based adversarial attack on ASR systems. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB.
Score: 27.281231584238824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. Through a comprehensive review and categorization of modern ASR technologies, we first meticulously select surrogate ASRs of diverse types to generate adversarial examples. Following this, ZQ-Attack initializes the adversarial perturbation with a scaled target command audio, rendering it relatively imperceptible while maintaining effectiveness. Subsequently, to achieve high transferability of adversarial perturbations, we propose a sequential ensemble optimization algorithm, which iteratively optimizes the adversarial perturbation on each surrogate model, leveraging collaborative information from other models. We conduct extensive experiments to evaluate ZQ-Attack. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB on 4 online speech recognition services, and attains an average SRoA of 100% and SNR of 19.67dB on 16 open-source ASRs. For commercial intelligent voice control devices, ZQ-Attack also achieves a 100% SRoA with an average SNR of 15.77dB in the over-the-air setting.

Related papers

Black-Box Adversarial Attack on Vision Language Models for Autonomous Driving [65.61999354218628]
We take the first step toward designing black-box adversarial attacks specifically targeting vision-language models (VLMs) in autonomous driving systems. We propose Cascading Adversarial Disruption (CAD), which targets low-level reasoning breakdown by generating and injecting semantics. We present Risky Scene Induction, which addresses dynamic adaptation by leveraging a surrogate VLM to understand and construct high-level risky scenarios.
arXiv Detail & Related papers (2025-01-23T11:10:02Z)
Backdoor Attacks against No-Reference Image Quality Assessment Models via a Scalable Trigger [76.36315347198195]
No-Reference Image Quality Assessment (NR-IQA) plays a critical role in evaluating and optimizing computer vision systems. Recent research indicates that NR-IQA models are susceptible to adversarial attacks. We present a novel poisoning-based backdoor attack against NR-IQA (BAIQA)
arXiv Detail & Related papers (2024-12-10T08:07:19Z)
Transferable Adversarial Attacks against ASR [43.766547483367795]
We study the vulnerability of practical black-box attacks in cutting-edge automatic speech recognition models. We propose a speech-aware gradient optimization approach (SAGO) for ASR, which forces mistranscription with minimal impact on human imperceptibility. Our comprehensive experimental results reveal performance enhancements compared to baseline approaches across five models on two databases.
arXiv Detail & Related papers (2024-11-14T06:32:31Z)
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features [25.28307679567351]
ALIF is the first black-box adversarial linguistic feature-based attack pipeline. We present ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment.
arXiv Detail & Related papers (2024-08-03T15:30:16Z)
STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario [50.37501379058119]
We propose the Spatial Transform Black-box Attack (STBA) to craft formidable adversarial examples in the query-limited scenario. We show that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
arXiv Detail & Related papers (2024-03-30T13:28:53Z)
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models [27.414693266500603]
A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition system to output attacker-chosen text. Recent work has shown that transferability against large ASR models is very difficult. We show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability.
arXiv Detail & Related papers (2022-09-17T15:01:26Z)
Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances. Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations. To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z)
Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation. ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations. The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z)
Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems [26.519207339530478]
Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal. We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline. We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness.
arXiv Detail & Related papers (2021-07-12T07:00:06Z)
Sparse and Imperceptible Adversarial Attack via a Homotopy Algorithm [93.80082636284922]
Sparse adversarial attacks can fool deep networks (DNNs) by only perturbing a few pixels. Recent efforts combine it with another l_infty perturbation on magnitudes. We propose a homotopy algorithm to tackle the sparsity and neural perturbation framework.
arXiv Detail & Related papers (2021-06-10T20:11:36Z)
Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo. Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation. Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z)
Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals. We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.