Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching
- URL: http://arxiv.org/abs/2503.18032v1
- Date: Sun, 23 Mar 2025 11:15:22 GMT
- Title: Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching
- Authors: Emma Coletta, Davide Salvi, Viola Negroni, Daniele Ugo Leonzio, Paolo Bestagini,
- Abstract summary: Speech deepfakes are synthetic audio signals that can imitate target speakers' voices.<n>Existing methods for detecting speech deepfakes rely on supervised learning.<n>We introduce a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task.
- Score: 8.466707742593078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of AI-driven generative models has enabled the creation of highly realistic speech deepfakes - synthetic audio signals that can imitate target speakers' voices - raising critical security concerns. Existing methods for detecting speech deepfakes primarily rely on supervised learning, which suffers from two critical limitations: limited generalization to unseen synthesis techniques and a lack of explainability. In this paper, we address these issues by introducing a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task. Our model is trained exclusively on real speech to characterize its distribution, enabling the classification of out-of-distribution samples as synthetically generated. Additionally, our framework produces interpretable anomaly maps during inference, highlighting anomalous regions across both time and frequency domains. This is done through a Student-Teacher Feature Pyramid Matching system, enhanced with Discrepancy Scaling to improve generalization capabilities across unseen data distributions. Extensive evaluations demonstrate the superior performance of our approach compared to the considered baselines, validating the effectiveness of framing speech deepfake detection as an anomaly detection problem.
Related papers
- FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning [9.960675988638805]
We propose a novel framework called fake audio detection with evidential learning (FADEL)
FADEL incorporates model uncertainty into its predictions, thereby leading to more robust performance in OOD scenarios.
We demonstrate the validity of uncertainty estimation by analyzing a strong correlation between average uncertainty and equal error rate (EER) across different spoofing algorithms.
arXiv Detail & Related papers (2025-04-22T07:40:35Z) - Phoneme-Level Feature Discrepancies: A Key to Detecting Sophisticated Speech Deepfakes [13.218438914114019]
Phoneme features provide a powerful speech representation for deepfake detection.
We develop a new mechanism for detecting speech deepfakes by identifying the inconsistencies of phoneme-level speech features.
arXiv Detail & Related papers (2024-12-17T07:31:19Z) - Leveraging Mixture of Experts for Improved Speech Deepfake Detection [53.69740463004446]
Speech deepfakes pose a significant threat to personal security and content authenticity.
We introduce a novel approach for enhancing speech deepfake detection performance using a Mixture of Experts architecture.
arXiv Detail & Related papers (2024-09-24T13:24:03Z) - Statistics-aware Audio-visual Deepfake Detector [11.671275975119089]
Methods in audio-visualfake detection mostly assess the synchronization between audio and visual features.
We propose a statistical feature loss to enhance the discrimination capability of the model.
Experiments on the DFDC and FakeAVCeleb datasets demonstrate the relevance of the proposed method.
arXiv Detail & Related papers (2024-07-16T12:15:41Z) - Targeted Augmented Data for Audio Deepfake Detection [11.671275975119089]
We propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model.
Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities.
arXiv Detail & Related papers (2024-07-10T12:31:53Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos.
We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics.
Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z) - Deep Spectro-temporal Artifacts for Detecting Synthesized Speech [57.42110898920759]
This paper provides an overall assessment of track 1 (Low-quality Fake Audio Detection) and track 2 (Partially Fake Audio Detection)
In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features.
We ranked 4th and 5th in track 1 and track 2, respectively.
arXiv Detail & Related papers (2022-10-11T08:31:30Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.