Related papers: SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops

SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops

URL: http://arxiv.org/abs/2502.09553v1
Date: Thu, 13 Feb 2025 18:05:12 GMT
Title: SyntheticPop: Attacking Speaker Verification Systems With Synthetic VoicePops
Authors: Eshaq Jamdar, Amith Kamath Belman,
Abstract summary: Voice Pops aims to distinguish an individual's unique phoneme pronunciations during the enrollment process.<n>We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme recognition capabilities of the VA+VoicePop system.<n>We achieve an attack success rate of over 95% while poisoning 20% of the training dataset.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Voice Authentication (VA), also known as Automatic Speaker Verification (ASV), is a widely adopted authentication method, particularly in automated systems like banking services, where it serves as a secondary layer of user authentication. Despite its popularity, VA systems are vulnerable to various attacks, including replay, impersonation, and the emerging threat of deepfake audio that mimics the voice of legitimate users. To mitigate these risks, several defense mechanisms have been proposed. One such solution, Voice Pops, aims to distinguish an individual's unique phoneme pronunciations during the enrollment process. While promising, the effectiveness of VA+VoicePop against a broader range of attacks, particularly logical or adversarial attacks, remains insufficiently explored. We propose a novel attack method, which we refer to as SyntheticPop, designed to target the phoneme recognition capabilities of the VA+VoicePop system. The SyntheticPop attack involves embedding synthetic "pop" noises into spoofed audio samples, significantly degrading the model's performance. We achieve an attack success rate of over 95% while poisoning 20% of the training dataset. Our experiments demonstrate that VA+VoicePop achieves 69% accuracy under normal conditions, 37% accuracy when subjected to a baseline label flipping attack, and just 14% accuracy under our proposed SyntheticPop attack, emphasizing the effectiveness of our method.

Related papers

Selective Masking Adversarial Attack on Automatic Speech Recognition Systems [14.719709247756178]
We propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition. Experiments demonstrate that SMA attack can generate effective and audio adversarial examples in the dual-source scenario.
arXiv Detail & Related papers (2025-04-06T07:30:08Z)
Mitigating Unauthorized Speech Synthesis for Voice Protection [7.1578783467799]
malicious voice exploitation has brought huge hazards in our daily lives. It is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. We devise Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples.
arXiv Detail & Related papers (2024-10-28T05:16:37Z)
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection [9.940661629195086]
PhantomSound is a query-efficient black-box attack toward voice assistants. We show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks.
arXiv Detail & Related papers (2023-09-13T13:50:41Z)
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks. We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z)
Dictionary Attacks on Speaker Verification [15.00667613025837]
We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population. We show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
arXiv Detail & Related papers (2022-04-24T15:31:41Z)
Voting for the right answer: Adversarial defense for speaker verification [79.10523688806852]
ASV is under the radar of adversarial attacks, which are similar to their original counterparts from human's perception. We propose the idea of "voting for the right answer" to prevent risky decisions of ASV in blind spot areas. Experimental results show that our proposed method improves the robustness against both the limited-knowledge attackers.
arXiv Detail & Related papers (2021-06-15T04:05:28Z)
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z)
Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate. We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique. Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z)
Cortical Features for Defense Against Adversarial Audio Attacks [55.61885805423492]
We propose using a computational model of the auditory cortex as a defense against adversarial attacks on audio. We show that the cortical features help defend against universal adversarial examples.
arXiv Detail & Related papers (2021-01-30T21:21:46Z)
VenoMave: Targeted Poisoning Against Speech Recognition [30.448709704880518]
VENOMAVE is the first training-time poisoning attack against speech recognition. We evaluate our attack on two datasets: TIDIGITS and Speech Commands.
arXiv Detail & Related papers (2020-10-21T00:30:08Z)
Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning [71.17774313301753]
We explore the robustness of self-supervised learned high-level representations by using them in the defense against adversarial attacks. Experimental results on the ASVspoof 2019 dataset demonstrate that high-level representations extracted by Mockingjay can prevent the transferability of adversarial examples.
arXiv Detail & Related papers (2020-06-05T03:03:06Z)
Detecting Audio Attacks on ASR Systems with Dropout Uncertainty [40.9172128924305]
We show that our defense is able to detect attacks created through optimized perturbations and frequency masking. We test our defense on Mozilla's CommonVoice dataset, the UrbanSound dataset, and an excerpt of the LibriSpeech dataset.
arXiv Detail & Related papers (2020-06-02T19:40:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.