Related papers: Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations

Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations

URL: http://arxiv.org/abs/2409.12553v1
Date: Thu, 19 Sep 2024 08:21:52 GMT
Title: Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations
Authors: Jonatan Bartolini, Todor Stoyanov, Alberto Giaretta,
Abstract summary: We propose a new poisoning approach that maps different environmental trigger sounds to target phrases of different lengths. We test our approach on Whisper, one of the most popular transformer-based SR model, showing that it is highly vulnerable to our attack. To mitigate the attack proposed in this paper, we investigate the use of Silero VAD, a state-of-the-art voice activity detection (VAD) model, as a defence mechanism.
Score: 3.5639148953570836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thanks to the popularisation of transformer-based models, speech recognition (SR) is gaining traction in various application fields, such as industrial and robotics environments populated with mission-critical devices. While transformer-based SR can provide various benefits for simplifying human-machine interfacing, the research on the cybersecurity aspects of these models is lacklustre. In particular, concerning backdoor poisoning attacks. In this paper, we propose a new poisoning approach that maps different environmental trigger sounds to target phrases of different lengths, during the fine-tuning phase. We test our approach on Whisper, one of the most popular transformer-based SR model, showing that it is highly vulnerable to our attack, under several testing conditions. To mitigate the attack proposed in this paper, we investigate the use of Silero VAD, a state-of-the-art voice activity detection (VAD) model, as a defence mechanism. Our experiments show that it is possible to use VAD models to filter out malicious triggers and mitigate our attacks, with a varying degree of success, depending on the type of trigger sound and testing conditions.

Related papers

Robust and Transferable Backdoor Attacks Against Deep Image Compression With Selective Frequency Prior [118.92747171905727]
This paper introduces a novel frequency-based trigger injection model for launching backdoor attacks with multiple triggers on learned image compression models. We design attack objectives tailored to diverse scenarios, including: 1) degrading compression quality in terms of bit-rate and reconstruction accuracy; 2) targeting task-driven measures like face recognition and semantic segmentation. Experiments show that our trigger injection models, combined with minor modifications to encoder parameters, successfully inject multiple backdoors and their triggers into a single compression model.
arXiv Detail & Related papers (2024-12-02T15:58:40Z)
Twin Trigger Generative Networks for Backdoor Attacks against Object Detection [14.578800906364414]
Object detectors, which are widely used in real-world applications, are vulnerable to backdoor attacks. Most research on backdoor attacks has focused on image classification, with limited investigation into object detection. We propose novel twin trigger generative networks to generate invisible triggers for implanting backdoors into models during training, and visible triggers for steady activation during inference.
arXiv Detail & Related papers (2024-11-23T03:46:45Z)
Let the Noise Speak: Harnessing Noise for a Unified Defense Against Adversarial and Backdoor Attacks [31.291700348439175]
Malicious data manipulation attacks against machine learning jeopardize its reliability in safety-critical applications. We propose NoiSec, a reconstruction-based intrusion detection system. NoiSec disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation.
arXiv Detail & Related papers (2024-06-18T21:44:51Z)
Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z)
The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of Triggers [0.0]
Recent research has uncovered that auditory backdoors may use certain modifications as their initiating mechanism. DynamicTrigger is introduced as a methodology for carrying out dynamic backdoor attacks. By utilizing fluctuating signal sampling rates and masking speaker identities through dynamic sound triggers, it is possible to deceive speech recognition systems.
arXiv Detail & Related papers (2024-01-03T04:31:59Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger [106.10954454667757]
We present a novel backdoor attack with multiple triggers against learned image compression models. Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model.
arXiv Detail & Related papers (2023-02-28T15:39:31Z)
Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise [18.19207291891767]
adversarial attacks against deep ASR systems are highly successful. This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection. Inverse filter bank features generally perform better in both clean and noisy environments.
arXiv Detail & Related papers (2022-11-03T07:25:45Z)
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks. We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z)
Imperceptible and Robust Backdoor Attack in 3D Point Cloud [62.992167285646275]
We propose a novel imperceptible and robust backdoor attack (IRBA) to tackle this challenge. We utilize a nonlinear and local transformation, called weighted local transformation (WLT), to construct poisoned samples with unique transformations. Experiments on three benchmark datasets and four models show that IRBA achieves 80%+ ASR in most cases even with pre-processing techniques.
arXiv Detail & Related papers (2022-08-17T03:53:10Z)
Robustifying automatic speech recognition by extracting slowly varying features [16.74051650034954]
We propose a defense mechanism against targeted adversarial attacks. We use hybrid ASR models trained on data pre-processed in such a way. Our model shows a performance on clean data similar to the baseline model, while being more than four times more robust.
arXiv Detail & Related papers (2021-12-14T13:50:23Z)
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks. Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.