Detecting music deepfakes is easy but actually hard
- URL: http://arxiv.org/abs/2405.04181v2
- Date: Wed, 22 May 2024 09:31:21 GMT
- Title: Detecting music deepfakes is easy but actually hard
- Authors: Darius Afchar, Gabriel Meseguer-Brocal, Romain Hennequin,
- Abstract summary: Music deepfakes pose a real threat of fraud on streaming services and unfair competition to human artists.
This paper demonstrates the possibility of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%.
To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery.
- Score: 8.070014188337307
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.
Related papers
- Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion [11.060929679400667]
We propose a multimodal, modular late-fusion pipeline that combines automatically transcribed lyrics and speech features capturing lyrics-related information within the audio.<n>Our method, DE-detect, outperforms existing lyrics-based detectors while also being more robust to audio perturbations.
arXiv Detail & Related papers (2025-06-19T02:56:49Z) - Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook [101.30779332427217]
We survey deepfake generation and detection techniques, including the most recent developments in the field.
We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content.
We develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content.
arXiv Detail & Related papers (2024-11-29T08:29:25Z) - Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - Fooling State-of-the-Art Deepfake Detection with High-Quality Deepfakes [2.0883760606514934]
We show that deepfake detectors proven to generalize well on multiple research datasets still struggle in real-world scenarios with well-crafted fakes.
We propose a novel autoencoder for face swapping alongside an advanced face blending technique, which we utilize to generate 90 high-quality deepfakes.
arXiv Detail & Related papers (2023-05-09T09:08:49Z) - Why Do Facial Deepfake Detectors Fail? [9.60306700003662]
Recent advancements in deepfake technology have allowed the creation of highly realistic fake media, such as video, image, and audio.
These materials pose significant challenges to human authentication, such as impersonation, misinformation, or even a threat to national security.
Several deepfake detection algorithms have been proposed, leading to an ongoing arms race between deepfake creators and deepfake detectors.
arXiv Detail & Related papers (2023-02-25T20:54:02Z) - Deepfake CAPTCHA: A Method for Preventing Fake Calls [5.810459869589559]
We propose D-CAPTCHA: an active defense against real-time deepfakes.
The approach is to force the adversary into the spotlight by challenging the deepfake model to generate content which exceeds its capabilities.
In contrast to existing CAPTCHAs, we challenge the AI's ability to create content as opposed to its ability to classify content.
arXiv Detail & Related papers (2023-01-08T15:34:19Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Deepfake Caricatures: Amplifying attention to artifacts increases
deepfake detection by humans and machines [17.7858728343141]
Deepfakes pose a serious threat to digital well-being by fueling misinformation.
We introduce a framework for amplifying artifacts in deepfake videos to make them more detectable by people.
We propose a novel, semi-supervised Artifact Attention module, which is trained on human responses to create attention maps that highlight video artifacts.
arXiv Detail & Related papers (2022-06-01T14:43:49Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Artificial Fingerprinting for Generative Models: Rooting Deepfake
Attribution in Training Data [64.65952078807086]
Photorealistic image generation has reached a new level of quality due to the breakthroughs of generative adversarial networks (GANs)
Yet, the dark side of such deepfakes, the malicious use of generated media, raises concerns about visual misinformation.
We seek a proactive and sustainable solution on deepfake detection by introducing artificial fingerprints into the models.
arXiv Detail & Related papers (2020-07-16T16:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.