Parrot-Trained Adversarial Examples: Pushing the Practicality of
Black-Box Audio Attacks against Speaker Recognition Models
- URL: http://arxiv.org/abs/2311.07780v2
- Date: Fri, 17 Nov 2023 21:34:33 GMT
- Title: Parrot-Trained Adversarial Examples: Pushing the Practicality of
Black-Box Audio Attacks against Speaker Recognition Models
- Authors: Rui Duan, Zhe Qu, Leah Ding, Yao Liu, Zhuo Lu
- Abstract summary: Black-box attacks still require certain information from the speaker recognition model to be effective.
This work aims to push the practicality of the black-box attacks by minimizing the attacker's knowledge about a target speaker recognition model.
We propose a new mechanism, called parrot training, to generate AEs against the target model.
- Score: 18.796342190114064
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Audio adversarial examples (AEs) have posed significant security challenges
to real-world speaker recognition systems. Most black-box attacks still require
certain information from the speaker recognition model to be effective (e.g.,
keeping probing and requiring the knowledge of similarity scores). This work
aims to push the practicality of the black-box attacks by minimizing the
attacker's knowledge about a target speaker recognition model. Although it is
not feasible for an attacker to succeed with completely zero knowledge, we
assume that the attacker only knows a short (or a few seconds) speech sample of
a target speaker. Without any probing to gain further knowledge about the
target model, we propose a new mechanism, called parrot training, to generate
AEs against the target model. Motivated by recent advancements in voice
conversion (VC), we propose to use the one short sentence knowledge to generate
more synthetic speech samples that sound like the target speaker, called parrot
speech. Then, we use these parrot speech samples to train a parrot-trained(PT)
surrogate model for the attacker. Under a joint transferability and perception
framework, we investigate different ways to generate AEs on the PT model
(called PT-AEs) to ensure the PT-AEs can be generated with high transferability
to a black-box target model with good human perceptual quality. Real-world
experiments show that the resultant PT-AEs achieve the attack success rates of
45.8% - 80.8% against the open-source models in the digital-line scenario and
47.9% - 58.3% against smart devices, including Apple HomePod (Siri), Amazon
Echo, and Google Home, in the over-the-air scenario.
Related papers
- PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
Split-Second Phoneme Injection [9.940661629195086]
PhantomSound is a query-efficient black-box attack toward voice assistants.
We show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air.
We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks.
arXiv Detail & Related papers (2023-09-13T13:50:41Z) - Interpretable Spectrum Transformation Attacks to Speaker Recognition [8.770780902627441]
A general framework is proposed to improve the transferability of adversarial voices to a black-box victim model.
The proposed framework operates voices in the time-frequency domain, which improves the interpretability, transferability, and imperceptibility of the attack.
arXiv Detail & Related papers (2023-02-21T14:12:29Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models.
The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population.
We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z) - FoolHD: Fooling speaker identification by Highly imperceptible
adversarial Disturbances [63.80959552818541]
We propose a white-box steganography-inspired adversarial attack that generates imperceptible perturbations against a speaker identification model.
Our approach, FoolHD, uses a Gated Convolutional Autoencoder that operates in the DCT domain and is trained with a multi-objective loss function.
We validate FoolHD with a 250-speaker identification x-vector network, trained using VoxCeleb, in terms of accuracy, success rate, and imperceptibility.
arXiv Detail & Related papers (2020-11-17T07:38:26Z) - VenoMave: Targeted Poisoning Against Speech Recognition [30.448709704880518]
VENOMAVE is the first training-time poisoning attack against speech recognition.
We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
arXiv Detail & Related papers (2020-10-21T00:30:08Z) - Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised
Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks.
Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z) - Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio
Representation [51.37980448183019]
We propose Audio ALBERT, a lite version of the self-supervised speech representation model.
We show that Audio ALBERT is capable of achieving competitive performance with those huge models in the downstream tasks.
In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.
arXiv Detail & Related papers (2020-05-18T10:42:44Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.