VenoMave: Targeted Poisoning Against Speech Recognition
- URL: http://arxiv.org/abs/2010.10682v3
- Date: Thu, 20 Apr 2023 21:21:04 GMT
- Title: VenoMave: Targeted Poisoning Against Speech Recognition
- Authors: Hojjat Aghakhani, Lea Sch\"onherr, Thorsten Eisenhofer, Dorothea
Kolossa, Thorsten Holz, Christopher Kruegel, and Giovanni Vigna
- Abstract summary: VENOMAVE is the first training-time poisoning attack against speech recognition.
We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
- Score: 30.448709704880518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite remarkable improvements, automatic speech recognition is susceptible
to adversarial perturbations. Compared to standard machine learning
architectures, these attacks are significantly more challenging, especially
since the inputs to a speech recognition system are time series that contain
both acoustic and linguistic properties of speech. Extracting all
recognition-relevant information requires more complex pipelines and an
ensemble of specialized components. Consequently, an attacker needs to consider
the entire pipeline. In this paper, we present VENOMAVE, the first
training-time poisoning attack against speech recognition. Similar to the
predominantly studied evasion attacks, we pursue the same goal: leading the
system to an incorrect and attacker-chosen transcription of a target audio
waveform. In contrast to evasion attacks, however, we assume that the attacker
can only manipulate a small part of the training data without altering the
target audio waveform at runtime. We evaluate our attack on two datasets:
TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset,
VENOMAVE achieves attack success rates of more than 80.0%, without access to
the victim's network architecture or hyperparameters. In a more realistic
scenario, when the target audio waveform is played over the air in different
rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE
achieves an attack transferability rate of 36.4% between two different model
architectures.
Related papers
- Trading Devil: Robust backdoor attack via Stochastic investment models and Bayesian approach [0.0]
This research looks at a specific type of attack, known as a investment-based backdoor attack (MarketBack)
MarketBack is in which adversaries strategically manipulate the stylistic properties of audio to fool speech recognition systems.
The security and integrity of machine learning models are seriously threatened by backdoor attacks.
arXiv Detail & Related papers (2024-06-15T19:12:00Z) - Defense Against Adversarial Attacks on Audio DeepFake Detection [0.4511923587827302]
Audio DeepFakes (DF) are artificially generated utterances created using deep learning.
Multiple neural network-based methods to detect generated speech have been proposed to prevent the threats.
arXiv Detail & Related papers (2022-12-30T08:41:06Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging
Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems.
This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training.
Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs
for Robust Speech Recognition [52.71604809100364]
We propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech.
Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.
In addition to the existing contrastive learning task, we switch the quantized representations of the original and noisy speech as additional prediction targets.
arXiv Detail & Related papers (2021-10-11T00:08:48Z) - Perceptual-based deep-learning denoiser as a defense against adversarial
attacks on ASR systems [26.519207339530478]
Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal.
We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline.
We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness.
arXiv Detail & Related papers (2021-07-12T07:00:06Z) - Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection.
Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z) - Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio.
We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z) - Adversarial Attacks against Neural Networks in Audio Domain: Exploiting
Principal Components [0.0]
Speech-to-text neural networks that are widely used today are prone to misclassify adversarial attacks.
We craft adversarial wave forms via Connectionist Temporal Classification Loss Function, and attack DeepSpeech, a speech-to-text neural network implemented by Mozilla.
We achieve 100% adversarial success rate (zero successful classification by DeepSpeech) on all 25 adversarial wave forms that we crafted.
arXiv Detail & Related papers (2020-07-14T12:35:03Z) - Characterizing Speech Adversarial Examples Using Self-Attention U-Net
Enhancement [102.48582597586233]
We present a U-Net based attention model, U-Net$_At$, to enhance adversarial speech signals.
We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks.
arXiv Detail & Related papers (2020-03-31T02:16:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.