Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
on Self-Supervised Speech Recognition models
- URL: http://arxiv.org/abs/2209.13523v2
- Date: Thu, 29 Sep 2022 13:54:49 GMT
- Title: Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
on Self-Supervised Speech Recognition models
- Authors: Raphael Olivier, Hadi Abdullah and Bhiksha Raj
- Abstract summary: A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition system to output attacker-chosen text.
Recent work has shown that transferability against large ASR models is very difficult.
We show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability.
- Score: 27.414693266500603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A targeted adversarial attack produces audio samples that can force an
Automatic Speech Recognition (ASR) system to output attacker-chosen text. To
exploit ASR models in real-world, black-box settings, an adversary can leverage
the transferability property, i.e. that an adversarial sample produced for a
proxy ASR can also fool a different remote ASR. However recent work has shown
that transferability against large ASR models is very difficult. In this work,
we show that modern ASR architectures, specifically ones based on
Self-Supervised Learning, are in fact vulnerable to transferability. We
successfully demonstrate this phenomenon by evaluating state-of-the-art
self-supervised ASR models like Wav2Vec2, HuBERT, Data2Vec and WavLM. We show
that with low-level additive noise achieving a 30dB Signal-Noise Ratio, we can
achieve target transferability with up to 80% accuracy. Next, we 1) use an
ablation study to show that Self-Supervised learning is the main cause of that
phenomenon, and 2) we provide an explanation for this phenomenon. Through this
we show that modern ASR architectures are uniquely vulnerable to adversarial
security threats.
Related papers
- Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs [73.74375912785689]
This paper proposes unified training strategies for speech recognition systems.
We demonstrate that training a single model for all three tasks enhances VSR and AVSR performance.
We also introduce a greedy pseudo-labelling approach to more effectively leverage unlabelled samples.
arXiv Detail & Related papers (2024-11-04T16:46:53Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems [27.281231584238824]
Black-box adversarial example attacks pose significant threats to real-world ASR systems.
We propose ZQ-Attack, a transfer-based adversarial attack on ASR systems.
In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB.
arXiv Detail & Related papers (2024-06-27T16:39:36Z) - End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements.
All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - Robustifying automatic speech recognition by extracting slowly varying features [16.74051650034954]
We propose a defense mechanism against targeted adversarial attacks.
We use hybrid ASR models trained on data pre-processed in such a way.
Our model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
arXiv Detail & Related papers (2021-12-14T13:50:23Z) - Sequential Randomized Smoothing for Adversarially Robust Speech
Recognition [26.96883887938093]
We show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.
Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations.
arXiv Detail & Related papers (2021-11-05T21:51:40Z) - SoK: A Modularized Approach to Study the Security of Automatic Speech
Recognition Systems [13.553395767144284]
We present our systematization of knowledge for ASR security and provide a comprehensive taxonomy for existing work based on a modularized workflow.
We align the research in this domain with that on security in Image Recognition System (IRS), which has been extensively studied.
Their similarities allow us to systematically study existing literature in ASR security based on the spectrum of attacks and defense solutions proposed for IRS.
In contrast, their differences, especially the complexity of ASR compared with IRS, help us learn unique challenges and opportunities in ASR security.
arXiv Detail & Related papers (2021-03-19T06:24:04Z) - Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative
Adversarial Networks [10.723935272906461]
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored.
We introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective.
Our proposed approach outperforms baselines and conventional GAN-based adversarial models.
arXiv Detail & Related papers (2021-03-10T17:40:48Z) - Dual-mode ASR: Unify and Improve Streaming ASR with Full-context
Modeling [76.43479696760996]
We propose a unified framework, Dual-mode ASR, to train a single end-to-end ASR model with shared weights for both streaming and full-context speech recognition.
We show that the latency and accuracy of streaming ASR significantly benefit from weight sharing and joint training of full-context ASR.
arXiv Detail & Related papers (2020-10-12T21:12:56Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.