Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
- URL: http://arxiv.org/abs/2510.06544v2
- Date: Fri, 17 Oct 2025 03:17:02 GMT
- Title: Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
- Authors: Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin,
- Abstract summary: Existing benchmarks aggregate diverse fake voice samples into a single dataset for evaluation.<n>This practice masks method-specific artifacts and obscures the varying performance of detectors against different generation paradigms.<n>We introduce the first ecosystem-level benchmark that systematically evaluates the interplay between 17 state-of-the-art fake voice generators and 8 leading detectors through a novel one-to-one evaluation protocol.
- Score: 5.051497895059242
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of fake voice generation technology has ignited a race with detection systems, creating an urgent need to secure the audio ecosystem. However, existing benchmarks suffer from a critical limitation: they typically aggregate diverse fake voice samples into a single dataset for evaluation. This practice masks method-specific artifacts and obscures the varying performance of detectors against different generation paradigms, preventing a nuanced understanding of their true vulnerabilities. To address this gap, we introduce the first ecosystem-level benchmark that systematically evaluates the interplay between 17 state-of-the-art fake voice generators and 8 leading detectors through a novel one-to-one evaluation protocol. This fine-grained analysis exposes previously hidden vulnerabilities and sensitivities that are missed by traditional aggregated testing. We also propose unified scoring systems to quantify both the evasiveness of generators and the robustness of detectors, enabling fair and direct comparisons. Our extensive cross-domain evaluation reveals that modern generators, particularly those based on neural audio codecs and flow matching, consistently evade top-tier detectors. We found that no single detector is universally robust; their effectiveness varies dramatically depending on the generator's architecture, highlighting a significant generalization gap in current defenses. This work provides a more realistic assessment of the threat landscape and offers actionable insights for building the next generation of detection systems.
Related papers
- Rethinking Cross-Generator Image Forgery Detection through DINOv3 [62.80415066351157]
Cross-generator detection has emerged as a new challenge forgenerative models.<n>We show that frozen visual foundation models, especially DINOv3, already exhibit strong cross-generator detection capability.<n>We introduce a training-free token-ranking strategy followed by a lightweight linear probe to select a small subset of authenticity-relevant tokens.
arXiv Detail & Related papers (2025-11-27T14:01:50Z) - Can Current Detectors Catch Face-to-Voice Deepfake Attacks? [6.799303764989023]
FOICE generates a victim's voice from a single facial image, without requiring any voice sample.<n>This raises serious security concerns, as facial images are far easier for adversaries to obtain than voice samples.<n>We present the first systematic evaluation of FOICE detection, showing that leading detectors consistently fail under both standard and noisy conditions.
arXiv Detail & Related papers (2025-10-23T21:24:55Z) - Why Speech Deepfake Detectors Won't Generalize: The Limits of Detection in an Open World [11.238970239267248]
Speech deepfake detectors are often evaluated on clean, benchmark-style conditions.<n>But deployment occurs in an open world of shifting devices, sampling rates, codecs, environments, and attack families.<n>This creates a coverage debt" for AI-based detectors, producing data blind spots that grow faster than data can be collected.
arXiv Detail & Related papers (2025-09-23T20:27:04Z) - Hybrid Audio Detection Using Fine-Tuned Audio Spectrogram Transformers: A Dataset-Driven Evaluation of Mixed AI-Human Speech [3.195044561824979]
We construct a novel hybrid audio dataset incorporating human, AI-generated, cloned, and mixed audio samples.<n>Our approach significantly outperforms existing baselines in mixed-audio detection, achieving 97% classification accuracy.<n>Our findings highlight the importance of hybrid datasets and tailored models in advancing the robustness of speech-based authentication systems.
arXiv Detail & Related papers (2025-05-21T05:43:41Z) - Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching [8.466707742593078]
Speech deepfakes are synthetic audio signals that can imitate target speakers' voices.<n>Existing methods for detecting speech deepfakes rely on supervised learning.<n>We introduce a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task.
arXiv Detail & Related papers (2025-03-23T11:15:22Z) - Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.<n>Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - VoiceWukong: Benchmarking Deepfake Voice Detection [6.8595368524357285]
We present VoiceWukong, a benchmark designed to evaluate the performance of deepfake voice detectors.
To build the dataset, we first collected deepfake voices generated by 19 commercial tools and 15 open-source tools.
We then created 38 data variants covering six types of manipulations, constructing the evaluation dataset for deepfake voice detection.
arXiv Detail & Related papers (2024-09-10T09:07:12Z) - Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024 [8.940008511570207]
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER)
The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices.
The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task.
arXiv Detail & Related papers (2024-09-03T21:28:45Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Proactive Detection of Voice Cloning with Localized Watermarking [50.13539630769929]
We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech.
AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level.
AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics.
arXiv Detail & Related papers (2024-01-30T18:56:22Z) - SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework [32.31180075214162]
This paper extensively reviews and analyzes state-of-the-art deepfake detectors, evaluating them against several critical criteria.<n>These criteria categorize detectors into 4 high-level groups and 13 finegrained sub-groups, aligned with a unified conceptual framework.<n>We evaluate the generalizability of 16 leading detectors across comprehensive attack scenarios, including black-box, white-box, and graybox settings.
arXiv Detail & Related papers (2024-01-09T05:32:22Z) - Can AI-Generated Text be Reliably Detected? [50.95804851595018]
Large Language Models (LLMs) perform impressively well in various applications.<n>The potential for misuse of these models in activities such as plagiarism, generating fake news, and spamming has raised concern about their responsible use.<n>We stress-test the robustness of these AI text detectors in the presence of an attacker.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - On the Detection of Adaptive Adversarial Attacks in Speaker Verification
Systems [0.0]
adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems.
The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks.
We show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks.
arXiv Detail & Related papers (2022-02-11T16:02:06Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - No Need to Know Physics: Resilience of Process-based Model-free Anomaly
Detection for Industrial Control Systems [95.54151664013011]
We present a novel framework to generate adversarial spoofing signals that violate physical properties of the system.
We analyze four anomaly detectors published at top security conferences.
arXiv Detail & Related papers (2020-12-07T11:02:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.