Harder or Different? Understanding Generalization of Audio Deepfake Detection
- URL: http://arxiv.org/abs/2406.03512v3
- Date: Wed, 12 Jun 2024 16:54:01 GMT
- Title: Harder or Different? Understanding Generalization of Audio Deepfake Detection
- Authors: Nicolas M. Müller, Nicholas Evans, Hemlata Tak, Philip Sperl, Konstantin Böttinger,
- Abstract summary: Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others.
The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect?
We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components.
- Score: 8.878420552256266
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using another model? We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components. Experiments performed using ASVspoof databases indicate that the hardness component is practically negligible, with the performance gap being attributed primarily to the difference component. This has direct implications for real-world deepfake detection, highlighting that merely increasing model capacity, the currently-dominant research trend, may not effectively address the generalization challenge.
Related papers
- FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection [12.594436202557446]
This paper investigates why Vision Transformers (ViTs) exhibit a suboptimal performance when dealing with the detection of facial forgeries.
We propose a deepfake detection framework called FakeFormer, which extends ViTs to enforce the extraction of subtle inconsistency-prone information.
Experiments are conducted on diverse well-known datasets, including FF++, Celeb-DF, WildDeepfake, DFD, DFDCP, and DFDC.
arXiv Detail & Related papers (2024-10-29T11:36:49Z) - Metamorphic Testing-based Adversarial Attack to Fool Deepfake Detectors [2.0649235321315285]
There is a dire need for deepfake detection technology to help spot deepfake media.
Current deepfake detection models are able to achieve outstanding accuracy (>90%)
This study identifies makeup application as an adversarial attack that could fool deepfake detectors.
arXiv Detail & Related papers (2022-04-19T02:24:30Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability.
We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.