Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis
- URL: http://arxiv.org/abs/2601.02914v1
- Date: Tue, 06 Jan 2026 10:55:32 GMT
- Title: Vulnerabilities of Audio-Based Biometric Authentication Systems Against Deepfake Speech Synthesis
- Authors: Mengze Hong, Di Jiang, Zeying Xie, Weiwei Zhao, Guan Wang, Chen Jason Zhang,
- Abstract summary: Modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems.<n>Anti-spoofing detectors struggle to generalize across different methods of audio synthesis.
- Score: 21.895422296912454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As audio deepfakes transition from research artifacts to widely available commercial tools, robust biometric authentication faces pressing security threats in high-stakes industries. This paper presents a systematic empirical evaluation of state-of-the-art speaker authentication systems based on a large-scale speech synthesis dataset, revealing two major security vulnerabilities: 1) modern voice cloning models trained on very small samples can easily bypass commercial speaker verification systems; and 2) anti-spoofing detectors struggle to generalize across different methods of audio synthesis, leading to a significant gap between in-domain performance and real-world robustness. These findings call for a reconsideration of security measures and stress the need for architectural innovations, adaptive defenses, and the transition towards multi-factor authentication.
Related papers
- SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning [52.29460857893198]
Existing fraud detection methods rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context.<n>We propose SAFE-QAQ, an end-to-end comprehensive framework for audio-based slow-thinking fraud detection.<n>Our framework introduces a dynamic risk assessment framework during live calls, enabling early detection and prevention of fraud.
arXiv Detail & Related papers (2026-01-04T06:09:07Z) - Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race [5.051497895059242]
Existing benchmarks aggregate diverse fake voice samples into a single dataset for evaluation.<n>This practice masks method-specific artifacts and obscures the varying performance of detectors against different generation paradigms.<n>We introduce the first ecosystem-level benchmark that systematically evaluates the interplay between 17 state-of-the-art fake voice generators and 8 leading detectors through a novel one-to-one evaluation protocol.
arXiv Detail & Related papers (2025-10-08T00:52:06Z) - A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems [5.2851376150891864]
This survey presents a review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs)<n>We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements.<n>By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.
arXiv Detail & Related papers (2025-08-22T23:57:04Z) - Deep Learning Models for Robust Facial Liveness Detection [56.08694048252482]
This study introduces a robust solution through novel deep learning models addressing the deficiencies in contemporary anti-spoofing techniques.<n>By innovatively integrating texture analysis and reflective properties associated with genuine human traits, our models distinguish authentic presence from replicas with remarkable precision.
arXiv Detail & Related papers (2025-08-12T17:19:20Z) - Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks [59.87470192277124]
This paper explores methods of compromising speech translation systems through imperceptible audio manipulations.<n>We present two innovative approaches: (1) the injection of perturbation into source audio, and (2) the generation of adversarial music designed to guide targeted translation.<n>Our experiments reveal that carefully crafted audio perturbations can mislead translation models to produce targeted, harmful outputs, while adversarial music achieve this goal more covertly.<n>The implications of this research extend beyond immediate security concerns, shedding light on the interpretability and robustness of neural speech processing systems.
arXiv Detail & Related papers (2025-03-02T16:38:16Z) - Exposing Synthetic Speech: Model Attribution and Detection of AI-generated Speech via Audio Fingerprints [11.703509488782345]
We introduce a training-free, yet effective approach for detecting AI-generated speech.<n>We tackle three key tasks: (1) single-model attribution in an open-world setting, (2) multi-model attribution in a closed-world setting, and (3) detection of synthetic versus real speech.
arXiv Detail & Related papers (2024-11-21T10:55:49Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Certification of Speaker Recognition Models to Additive Perturbations [4.332441337407564]
robustness of speaker recognition systems against adversarial attacks remains a significant challenge.<n>We pioneer applying robustness certification techniques to speaker recognition, initially developed for the image domain.
arXiv Detail & Related papers (2024-04-29T15:23:26Z) - All-for-One and One-For-All: Deep learning-based feature fusion for
Synthetic Speech Detection [18.429817510387473]
Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever.
In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them.
The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities.
arXiv Detail & Related papers (2023-07-28T13:50:25Z) - NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos.
We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics.
Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.