EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
- URL: http://arxiv.org/abs/2510.19414v1
- Date: Wed, 22 Oct 2025 09:34:31 GMT
- Title: EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
- Authors: Tong Zhang, Yihuan Huang, Yanzhen Ren,
- Abstract summary: We present EchoFake, a comprehensive dataset comprising more than 120 hours of audio from over 13,000 speakers.<n>By introducing more practical challenges relevant to real-world deployment, EchoFake offers a more realistic foundation for advancing spoofing detection methods.
- Score: 20.54516602251383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing prevalence of speech deepfakes has raised serious concerns, particularly in real-world scenarios such as telephone fraud and identity theft. While many anti-spoofing systems have demonstrated promising performance on lab-generated synthetic speech, they often fail when confronted with physical replay attacks-a common and low-cost form of attack used in practical settings. Our experiments show that models trained on existing datasets exhibit severe performance degradation, with average accuracy dropping to 59.6% when evaluated on replayed audio. To bridge this gap, we present EchoFake, a comprehensive dataset comprising more than 120 hours of audio from over 13,000 speakers, featuring both cutting-edge zero-shot text-to-speech (TTS) speech and physical replay recordings collected under varied devices and real-world environmental settings. Additionally, we evaluate three baseline detection models and show that models trained on EchoFake achieve lower average EERs across datasets, indicating better generalization. By introducing more practical challenges relevant to real-world deployment, EchoFake offers a more realistic foundation for advancing spoofing detection methods.
Related papers
- Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement [2.6008293644386904]
We introduce Treble10, a large-scale, physically accurate room-acoustic dataset.<n> Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms.<n>All signals are simulated at 32 kHz, accurately modelling low-frequency wave effects and high-frequency reflections.
arXiv Detail & Related papers (2025-10-27T09:17:44Z) - AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds [38.75029700407531]
AUDETER is a large-scale, highly diverse deepfake audio dataset.<n>It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns.<n>It is the largest deepfake audio dataset by scale.
arXiv Detail & Related papers (2025-09-04T16:03:44Z) - End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation [8.11594945165255]
We propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms.<n>Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing.
arXiv Detail & Related papers (2025-04-29T16:38:23Z) - Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions.<n>Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z) - I Can Hear You: Selective Robust Training for Deepfake Audio Detection [16.52185019459127]
We establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples.
Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset.
We propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components.
arXiv Detail & Related papers (2024-10-31T18:21:36Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.