Towards generalizing deep-audio fake detection networks
- URL: http://arxiv.org/abs/2305.13033v3
- Date: Tue, 9 Apr 2024 16:22:13 GMT
- Title: Towards generalizing deep-audio fake detection networks
- Authors: Konstantin Gasenzer, Moritz Wolter,
- Abstract summary: generative neural networks allow the creation of high-quality synthetic speech at scale.
We study the frequency domain fingerprints of current audio generators.
We train excellent lightweight detectors that generalize.
- Score: 1.0128808054306186
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Today's generative neural networks allow the creation of high-quality synthetic speech at scale. While we welcome the creative use of this new technology, we must also recognize the risks. As synthetic speech is abused for monetary and identity theft, we require a broad set of deepfake identification tools. Furthermore, previous work reported a limited ability of deep classifiers to generalize to unseen audio generators. We study the frequency domain fingerprints of current audio generators. Building on top of the discovered frequency footprints, we train excellent lightweight detectors that generalize. We report improved results on the WaveFake dataset and an extended version. To account for the rapid progress in the field, we extend the WaveFake dataset by additionally considering samples drawn from the novel Avocodo and BigVGAN networks. For illustration purposes, the supplementary material contains audio samples of generator artifacts.
Related papers
- Targeted Augmented Data for Audio Deepfake Detection [11.671275975119089]
We propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model.
Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities.
arXiv Detail & Related papers (2024-07-10T12:31:53Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Frequency-Aware Deepfake Detection: Improving Generalizability through
Frequency Space Learning [81.98675881423131]
This research addresses the challenge of developing a universal deepfake detector that can effectively identify unseen deepfake images.
Existing frequency-based paradigms have relied on frequency-level artifacts introduced during the up-sampling in GAN pipelines to detect forgeries.
We introduce a novel frequency-aware approach called FreqNet, centered around frequency domain learning, specifically designed to enhance the generalizability of deepfake detectors.
arXiv Detail & Related papers (2024-03-12T01:28:00Z) - Rethinking the Up-Sampling Operations in CNN-based Generative Network
for Generalizable Deepfake Detection [86.97062579515833]
We introduce the concept of Neighboring Pixel Relationships(NPR) as a means to capture and characterize the generalized structural artifacts stemming from up-sampling operations.
A comprehensive analysis is conducted on an open-world dataset, comprising samples generated by tft28 distinct generative models.
This analysis culminates in the establishment of a novel state-of-the-art performance, showcasing a remarkable tft11.6% improvement over existing methods.
arXiv Detail & Related papers (2023-12-16T14:27:06Z) - MIS-AVoiDD: Modality Invariant and Specific Representation for
Audio-Visual Deepfake Detection [4.659427498118277]
A novel kind of deepfakes has emerged with either audio or visual modalities manipulated.
Existing multimodal deepfake detectors are often based on the fusion of the audio and visual streams from the video.
In this paper, we tackle the problem at the representation level to aid the fusion of audio and visual streams for multimodal deepfake detection.
arXiv Detail & Related papers (2023-10-03T17:43:24Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation [51.06875680387692]
We present the first deepfake audio dataset for system fingerprint recognition (SFR)
We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies.
arXiv Detail & Related papers (2022-08-21T05:15:40Z) - WaveFake: A Data Set to Facilitate Audio Deepfake Detection [3.8073142980733]
This paper provides an introduction to signal processing techniques used for analyzing audio signals.
Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages.
Third, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
arXiv Detail & Related papers (2021-11-04T12:26:34Z) - A Generative Model for Raw Audio Using Transformer Architectures [4.594159253008448]
This paper proposes a novel way of doing audio synthesis at the waveform level using Transformer architectures.
We propose a deep neural network for generating waveforms, similar to wavenet citeoord2016wavenet.
Our approach outperforms a widely used wavenet architecture by up to 9% on a similar dataset for predicting the next step.
arXiv Detail & Related papers (2021-06-30T13:05:31Z) - Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes.
We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection.
We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.