AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
Recognition Systems
- URL: http://arxiv.org/abs/2206.03351v1
- Date: Tue, 7 Jun 2022 14:38:55 GMT
- Title: AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker
Recognition Systems
- Authors: Guangke Chen and Zhe Zhao and Fu Song and Sen Chen and Lingling Fan
and Yang Liu
- Abstract summary: Recent work has illuminated the vulnerability of speaker recognition systems (SRSs) against adversarial attacks.
We present AS2T, the first attack in this domain which covers all the settings.
We study the possible distortions occurred in over-the-air transmission, utilize different transformation functions with different parameters to model those distortions, and incorporate them into the generation of adversarial voices.
- Score: 15.013763364096638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has illuminated the vulnerability of speaker recognition systems
(SRSs) against adversarial attacks, raising significant security concerns in
deploying SRSs. However, they considered only a few settings (e.g., some
combinations of source and target speakers), leaving many interesting and
important settings in real-world attack scenarios alone. In this work, we
present AS2T, the first attack in this domain which covers all the settings,
thus allows the adversary to craft adversarial voices using arbitrary source
and target speakers for any of three main recognition tasks. Since none of the
existing loss functions can be applied to all the settings, we explore many
candidate loss functions for each setting including the existing and newly
designed ones. We thoroughly evaluate their efficacy and find that some
existing loss functions are suboptimal. Then, to improve the robustness of AS2T
towards practical over-the-air attack, we study the possible distortions
occurred in over-the-air transmission, utilize different transformation
functions with different parameters to model those distortions, and incorporate
them into the generation of adversarial voices. Our simulated over-the-air
evaluation validates the effectiveness of our solution in producing robust
adversarial voices which remain effective under various hardware devices and
various acoustic environments with different reverberation, ambient noises, and
noise levels. Finally, we leverage AS2T to perform thus far the largest-scale
evaluation to understand transferability among 14 diverse SRSs. The
transferability analysis provides many interesting and useful insights which
challenge several findings and conclusion drawn in previous works in the image
domain. Our study also sheds light on future directions of adversarial attacks
in the speaker recognition domain.
Related papers
- ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features [25.28307679567351]
ALIF is the first black-box adversarial linguistic feature-based attack pipeline.
We present ALIF-OTL and ALIF-OTA schemes for launching attacks in both the digital domain and the physical playback environment.
arXiv Detail & Related papers (2024-08-03T15:30:16Z) - MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition
Systems [7.924452626448202]
Current adversarial attacks against speaker recognition systems (SRSs) require either white-box access or heavy black-box queries to the target SRS.
We propose QFA2SR, an effective and imperceptible query-free black-box attack, by leveraging the transferability of adversarial voices.
QFA2SR is highly effective when launched over the air against three wide-spread voice assistants with 60%, 46%, and 70% targeted transferability, respectively.
arXiv Detail & Related papers (2023-05-23T14:20:13Z) - Spatial-Frequency Discriminability for Revealing Adversarial Perturbations [53.279716307171604]
Vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community.
Current algorithms typically detect adversarial patterns through discriminative decomposition for natural and adversarial data.
We propose a discriminative detector relying on a spatial-frequency Krawtchouk decomposition.
arXiv Detail & Related papers (2023-05-18T10:18:59Z) - Leveraging Domain Features for Detecting Adversarial Attacks Against
Deep Speech Recognition in Noise [18.19207291891767]
adversarial attacks against deep ASR systems are highly successful.
This work leverages filter bank-based features to better capture the characteristics of attacks for improved detection.
Inverse filter bank features generally perform better in both clean and noisy environments.
arXiv Detail & Related papers (2022-11-03T07:25:45Z) - Learning Transferable Adversarial Robust Representations via Multi-view
Consistency [57.73073964318167]
We propose a novel meta-adversarial multi-view representation learning framework with dual encoders.
We demonstrate the effectiveness of our framework on few-shot learning tasks from unseen domains.
arXiv Detail & Related papers (2022-10-19T11:48:01Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Towards Understanding and Mitigating Audio Adversarial Examples for
Speaker Recognition [13.163192823774624]
Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns.
We present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks on speaker recognition.
We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting.
arXiv Detail & Related papers (2022-06-07T15:38:27Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - WaveGuard: Understanding and Mitigating Audio Adversarial Examples [12.010555227327743]
We introduce WaveGuard: a framework for detecting adversarial inputs crafted to attack ASR systems.
Our framework incorporates audio transformation functions and analyses the ASR transcriptions of the original and transformed audio to detect adversarial inputs.
arXiv Detail & Related papers (2021-03-04T21:44:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.