PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
Split-Second Phoneme Injection
- URL: http://arxiv.org/abs/2309.06960v1
- Date: Wed, 13 Sep 2023 13:50:41 GMT
- Title: PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
Split-Second Phoneme Injection
- Authors: Hanqing Guo, Guangjing Wang, Yuanda Wang, Bocheng Chen, Qiben Yan, Li
Xiao
- Abstract summary: PhantomSound is a query-efficient black-box attack toward voice assistants.
We show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air.
We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks.
- Score: 9.940661629195086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose PhantomSound, a query-efficient black-box attack
toward voice assistants. Existing black-box adversarial attacks on voice
assistants either apply substitution models or leverage the intermediate model
output to estimate the gradients for crafting adversarial audio samples.
However, these attack approaches require a significant amount of queries with a
lengthy training stage. PhantomSound leverages the decision-based attack to
produce effective adversarial audios, and reduces the number of queries by
optimizing the gradient estimation. In the experiments, we perform our attack
against 4 different speech-to-text APIs under 3 real-world scenarios to
demonstrate the real-time attack impact. The results show that PhantomSound is
practical and robust in attacking 5 popular commercial voice controllable
devices over the air, and is able to bypass 3 liveness detection mechanisms
with >95% success rate. The benchmark result shows that PhantomSound can
generate adversarial examples and launch the attack in a few minutes. We
significantly enhance the query efficiency and reduce the cost of a successful
untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the
state-of-the-art black-box attacks, using merely ~300 queries (~5 minutes) and
~1,500 queries (~25 minutes), respectively.
Related papers
- SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops [0.0]
Voice Pops aims to distinguish an individual's unique phoneme pronunciations during the enrollment process.
We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme recognition capabilities of the VA+VoicePop system.
We achieve an attack success rate of over 95% while poisoning 20% of the training dataset.
arXiv Detail & Related papers (2025-02-13T18:05:12Z) - Who Can Withstand Chat-Audio Attacks? An Evaluation Benchmark for Large Language Models [70.99768410765502]
Adrial audio attacks pose a significant threat to the growing use of large language models (LLMs) in voice-based human-machine interactions.
We introduce the Chat-Audio Attacks benchmark including four distinct types of audio attacks.
We evaluate six state-of-the-art LLMs with voice interaction capabilities, including Gemini-1.5-Pro, GPT-4o, and others.
arXiv Detail & Related papers (2024-11-22T10:30:48Z) - Parrot-Trained Adversarial Examples: Pushing the Practicality of
Black-Box Audio Attacks against Speaker Recognition Models [18.796342190114064]
Black-box attacks still require certain information from the speaker recognition model to be effective.
This work aims to push the practicality of the black-box attacks by minimizing the attacker's knowledge about a target speaker recognition model.
We propose a new mechanism, called parrot training, to generate AEs against the target model.
arXiv Detail & Related papers (2023-11-13T22:12:19Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Parallel Rectangle Flip Attack: A Query-based Black-box Attack against
Object Detection [89.08832589750003]
We propose a Parallel Rectangle Flip Attack (PRFA) via random search to avoid sub-optimal detection near the attacked region.
Our method can effectively and efficiently attack various popular object detectors, including anchor-based and anchor-free, and generate transferable adversarial examples.
arXiv Detail & Related papers (2022-01-22T06:00:17Z) - Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio.
We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z) - VenoMave: Targeted Poisoning Against Speech Recognition [30.448709704880518]
VENOMAVE is the first training-time poisoning attack against speech recognition.
We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
arXiv Detail & Related papers (2020-10-21T00:30:08Z) - Simple and Efficient Hard Label Black-box Adversarial Attacks in Low
Query Budget Regimes [80.9350052404617]
We propose a simple and efficient Bayesian Optimization(BO) based approach for developing black-box adversarial attacks.
Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in a structured low-dimensional subspace.
Our proposed approach consistently achieves 2x to 10x higher attack success rate while requiring 10x to 20x fewer queries.
arXiv Detail & Related papers (2020-07-13T04:34:57Z) - AdvMind: Inferring Adversary Intent of Black-Box Attacks [66.19339307119232]
We present AdvMind, a new class of estimation models that infer the adversary intent of black-box adversarial attacks in a robust manner.
On average AdvMind detects the adversary intent with over 75% accuracy after observing less than 3 query batches.
arXiv Detail & Related papers (2020-06-16T22:04:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.