Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
- URL: http://arxiv.org/abs/2509.02859v1
- Date: Tue, 02 Sep 2025 22:11:29 GMT
- Title: Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
- Authors: Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel Alumäe, Matthew Magimai Doss,
- Abstract summary: Speech DeepFake (DF) Arena is the first comprehensive benchmark for audio deepfake detection.<n>DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios.<n>It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness.
- Score: 11.803603620762486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parallel to the development of advanced deepfake audio generation, audio deepfake detection has also seen significant progress. However, a standardized and comprehensive benchmark is still missing. To address this, we introduce Speech DeepFake (DF) Arena, the first comprehensive benchmark for audio deepfake detection. Speech DF Arena provides a toolkit to uniformly evaluate detection systems, currently across 14 diverse datasets and attack scenarios, standardized evaluation metrics and protocols for reproducibility and transparency. It also includes a leaderboard to compare and rank the systems to help researchers and developers enhance their reliability and robustness. We include 14 evaluation sets, 12 state-of-the-art open-source and 3 proprietary detection systems. Our study presents many systems exhibiting high EER in out-of-domain scenarios, highlighting the need for extensive cross-domain evaluation. The leaderboard is hosted on Huggingface1 and a toolkit for reproducing results across the listed datasets is available on GitHub.
Related papers
- DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection [52.13851094326683]
misuse of advanced generative AI models has resulted in the proliferation of falsified data.<n>Mega-MMDF is a large-scale, diverse, and high-quality dataset for multimodal deepfake detection.<n>DeepfakeBench-MM is the first unified benchmark for multimodal deepfake detection.
arXiv Detail & Related papers (2025-10-26T10:40:52Z) - AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit [7.279026980203529]
We systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT.<n>The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors.
arXiv Detail & Related papers (2025-09-25T21:09:40Z) - AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds [38.75029700407531]
AUDETER is a large-scale, highly diverse deepfake audio dataset.<n>It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns.<n>It is the largest deepfake audio dataset by scale.
arXiv Detail & Related papers (2025-09-04T16:03:44Z) - XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark [28.171858958370947]
XMAD-Bench is a large-scale cross-domain multilingual audio deepfake benchmark comprising 668.8 hours of real and deepfake speech.<n>Our benchmark highlights the need for the development of robust audio deepfake detectors.
arXiv Detail & Related papers (2025-05-31T08:28:36Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition [100.30565531246165]
Speech recognition systems require dataset-specific tuning.
This tuning requirement can lead to systems failing to generalise to other datasets and domains.
We introduce the End-to-end Speech Benchmark (ESB) for evaluating the performance of a single automatic speech recognition system.
arXiv Detail & Related papers (2022-10-24T15:58:48Z) - Audio Deepfake Attribution: An Initial Dataset and Investigation [41.62487394875349]
We design the first deepfake audio dataset for the attribution of audio generation tools, called Audio Deepfake Attribution (ADA)
We propose the Class- Multi-Center Learning ( CRML) method for open-set audio deepfake attribution (OSADA)
Experimental results demonstrate that the CRML method effectively addresses open-set risks in real-world scenarios.
arXiv Detail & Related papers (2022-08-21T05:15:40Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.