Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
- URL: http://arxiv.org/abs/2504.04394v1
- Date: Sun, 06 Apr 2025 07:30:08 GMT
- Title: Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
- Authors: Zheng Fang, Shenyi Zhang, Tao Wang, Bowen Li, Lingchen Zhao, Zhangyi Wang,
- Abstract summary: We propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition.<n>Experiments demonstrate that SMA attack can generate effective and audio adversarial examples in the dual-source scenario.
- Score: 14.719709247756178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking simultaneously. To bridge the gap, we propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition while the other audio source is muted in dual-source scenarios. To better adapt to the dual-source scenario, our SMA attack constructs the normal dual-source audio from the muted audio and selected audio. SMA attack initializes the adversarial perturbation with a small Gaussian noise and iteratively optimizes it using a selective masking optimization algorithm. Extensive experiments demonstrate that the SMA attack can generate effective and imperceptible audio adversarial examples in the dual-source scenario, achieving an average success rate of attack of 100% and signal-to-noise ratio of 37.15dB on Conformer-CTC, outperforming the baselines.
Related papers
- Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization [59.1277150358203]
We propose using a preference optimization strategy to improve speech recognition accuracy for real-world videos.<n>First, we create preference data via simulating common errors that occurred in AV-ASR from two focals.<n>Second, we propose BPO-AVASR, a Bifocal Preference Optimization method to improve AV-ASR models by leveraging both input-side and output-side preference.
arXiv Detail & Related papers (2024-12-26T00:26:45Z) - Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer [8.948537516293328]
We propose an attack on Automatic Speech Recognition (ASR) systems based on user-customized style transfer.
Our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks.
arXiv Detail & Related papers (2024-05-15T16:05:24Z) - Symmetric Saliency-based Adversarial Attack To Speaker Identification [17.087523686496958]
We propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED)
First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system.
Second, it proposes an angular loss function to push the speaker embedding far away from the source speaker.
arXiv Detail & Related papers (2022-10-30T08:54:02Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Blackbox Untargeted Adversarial Testing of Automatic Speech Recognition
Systems [1.599072005190786]
Speech recognition systems are prevalent in applications for voice navigation and voice control of domestic appliances.
Deep neural networks (DNNs) have been shown to be susceptible to adversarial perturbations.
To help test the correctness of ASRS, we propose techniques that automatically generate blackbox.
arXiv Detail & Related papers (2021-12-03T10:21:47Z) - Audio Attacks and Defenses against AED Systems - A Practical Study [2.365611283869544]
We evaluate deep learning-based Audio Event Detection (AED) systems against evasion attacks through adversarial examples.
We generate audio adversarial examples using two different types of noise, namely background and white noise, that can be used by the adversary to evade detection.
We show that these countermeasures, when applied to audio input, can be successful.
arXiv Detail & Related papers (2021-06-14T13:42:49Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio.
We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - Double Targeted Universal Adversarial Perturbations [83.60161052867534]
We introduce a double targeted universal adversarial perturbations (DT-UAPs) to bridge the gap between the instance-discriminative image-dependent perturbations and the generic universal perturbations.
We show the effectiveness of the proposed DTA algorithm on a wide range of datasets and also demonstrate its potential as a physical attack.
arXiv Detail & Related papers (2020-10-07T09:08:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.