A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
- URL: http://arxiv.org/abs/2508.16843v4
- Date: Tue, 16 Sep 2025 11:49:25 GMT
- Title: A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
- Authors: Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta, Sunil Aryal,
- Abstract summary: This survey presents a review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs)<n>We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements.<n>By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.
- Score: 5.2851376150891864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Voice authentication has undergone significant changes from traditional systems that relied on handcrafted acoustic features to deep learning models that can extract robust speaker embeddings. This advancement has expanded its applications across finance, smart devices, law enforcement, and beyond. However, as adoption has grown, so have the threats. This survey presents a comprehensive review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs), including data poisoning, adversarial, deepfake, and adversarial spoofing attacks. We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements. For each category of attack, we summarize methodologies, highlight commonly used datasets, compare performance and limitations, and organize existing literature using widely accepted taxonomies. By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.
Related papers
- Aegis: Towards Governance, Integrity, and Security of AI Voice Agents [52.7512082818639]
We propose Aegis, a framework for the governance, integrity, and security of voice agents.<n>We evaluate the framework through case studies in banking call centers, IT Support, and logistics.<n>We observe systematic differences across model families, with open-weight models exhibiting higher susceptibility.
arXiv Detail & Related papers (2026-02-07T05:51:36Z) - Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis [21.895422296912454]
Modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems.<n>Anti-spoofing detectors struggle to generalize across different methods of audio synthesis.
arXiv Detail & Related papers (2026-01-06T10:55:32Z) - SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning [52.29460857893198]
Existing fraud detection methods rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context.<n>We propose SAFE-QAQ, an end-to-end comprehensive framework for audio-based slow-thinking fraud detection.<n>Our framework introduces a dynamic risk assessment framework during live calls, enabling early detection and prevention of fraud.
arXiv Detail & Related papers (2026-01-04T06:09:07Z) - Synthetic Voices, Real Threats: Evaluating Large Text-to-Speech Models in Generating Harmful Audio [63.18443674004945]
This work explores a content-centric threat: exploiting TTS systems to produce speech containing harmful content.<n>We present HARMGEN, a suite of five attacks organized into two families that address these challenges.
arXiv Detail & Related papers (2025-11-14T03:00:04Z) - Spectral Masking and Interpolation Attack (SMIA): A Black-box Adversarial Attack against Voice Authentication and Anti-Spoofing Systems [5.2851376150891864]
Spectral Masking and Interpolation Attack (SMIA) is a novel method that strategically manipulates inaudible frequency regions of AI-generated audio.<n>SMIA achieved a strong attack success rate (ASR) of at least 82% against combined VAS/CM systems, at least 97.5% against standalone speaker verification systems, and 100% against countermeasures.<n>This work highlights the urgent need for a paradigm shift toward next-generation defenses that employ dynamic, context-aware frameworks.
arXiv Detail & Related papers (2025-09-09T12:43:59Z) - Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z) - Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations.<n>We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation.<n>Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly.<n>The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - A Survey on Speech Deepfake Detection [7.3348524333159]
Speech Deepfakes pose a serious threat by generating realistic voices and spreading misinformation.<n>To combat this, numerous challenges have been organized to advance speech Deepfake detection techniques.<n>We systematically analyze more than 200 papers published up to March 2024.
arXiv Detail & Related papers (2024-04-22T06:52:12Z) - A Practical Survey on Emerging Threats from AI-driven Voice Attacks: How Vulnerable are Commercial Voice Control Systems? [13.115517847161428]
AI-driven audio attacks have revealed new security vulnerabilities in voice control systems.
Our study endeavors to assess the resilience of commercial voice control systems against a spectrum of malicious audio attacks.
Our results suggest that commercial voice control systems exhibit enhanced resistance to existing threats.
arXiv Detail & Related papers (2023-12-10T21:51:13Z) - Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
Fact-Verification Systems [80.3811072650087]
We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence.
The attacks are also robust against post-hoc modifications of the claim.
These attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios.
arXiv Detail & Related papers (2022-09-07T13:39:24Z) - SoK: A Study of the Security on Voice Processing Systems [2.596028864336544]
We will identify and classify an arrangement of unique attacks on voice processing systems.
The current and most frequently used machine learning systems and deep neural networks are at the core of modern voice processing systems.
arXiv Detail & Related papers (2021-12-24T21:47:06Z) - Practical Attacks on Voice Spoofing Countermeasures [3.388509725285237]
We show how a malicious actor may efficiently craft audio samples to bypass voice authentication in its strictest form.
Our results call into question the security of modern voice authentication systems in light of the real threat of attackers bypassing these measures.
arXiv Detail & Related papers (2021-07-30T14:07:49Z) - SoK: The Faults in our ASRs: An Overview of Attacks against Automatic
Speech Recognition and Speaker Identification Systems [28.635467696564703]
We show that the end-to-end architecture of speech and speaker systems makes attacks and defenses against them substantially different than those in the image space.
We then demonstrate experimentally that attacks against these models almost universally fail to transfer.
arXiv Detail & Related papers (2020-07-13T18:52:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.