WaveFake: A Data Set to Facilitate Audio Deepfake Detection
- URL: http://arxiv.org/abs/2111.02813v1
- Date: Thu, 4 Nov 2021 12:26:34 GMT
- Title: WaveFake: A Data Set to Facilitate Audio Deepfake Detection
- Authors: Joel Frank, Lea Sch\"onherr
- Abstract summary: This paper provides an introduction to signal processing techniques used for analyzing audio signals.
Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages.
Third, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep generative modeling has the potential to cause significant harm to
society. Recognizing this threat, a magnitude of research into detecting
so-called "Deepfakes" has emerged. This research most often focuses on the
image domain, while studies exploring generated audio signals have, so-far,
been neglected. In this paper we make three key contributions to narrow this
gap. First, we provide researchers with an introduction to common signal
processing techniques used for analyzing audio signals. Second, we present a
novel data set, for which we collected nine sample sets from five different
network architectures, spanning two languages. Finally, we supply practitioners
with two baseline models, adopted from the signal processing community, to
facilitate further research in this area.
Related papers
- Understanding Audiovisual Deepfake Detection: Techniques, Challenges, Human Factors and Perceptual Insights [49.81915942821647]
Deep Learning has been successfully applied in diverse fields, and its impact on deepfake detection is no exception.
Deepfakes are fake yet realistic synthetic content that can be used deceitfully for political impersonation, phishing, slandering, or spreading misinformation.
This paper aims to improve the effectiveness of deepfake detection strategies and guide future research in cybersecurity and media integrity.
arXiv Detail & Related papers (2024-11-12T09:02:11Z) - Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals [15.595136769477614]
We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets.
Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively.
arXiv Detail & Related papers (2024-08-25T09:28:04Z) - Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - Towards generalizing deep-audio fake detection networks [1.0128808054306186]
generative neural networks allow the creation of high-quality synthetic speech at scale.
We study the frequency domain fingerprints of current audio generators.
We train excellent lightweight detectors that generalize.
arXiv Detail & Related papers (2023-05-22T13:37:52Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - On the Frequency Bias of Generative Models [61.60834513380388]
We analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training.
We find that none of the existing approaches can fully resolve spectral artifacts yet.
Our results suggest that there is great potential in improving the discriminator.
arXiv Detail & Related papers (2021-11-03T18:12:11Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - Deep Learning Radio Frequency Signal Classification with Hybrid Images [0.0]
We focus on the different pre-processing steps that can be used on the input training data, and test the results on a fixed Deep Learning architecture.
We propose a hybrid image that takes advantage of both time and frequency domain information, and tackles the classification as a Computer Vision problem.
arXiv Detail & Related papers (2021-05-19T11:12:09Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Scattering Features for Multimodal Gait Recognition [5.3526997662068085]
We consider the problem of identifying people on the basis of their walk (gait) pattern.
We rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively.
arXiv Detail & Related papers (2020-01-23T22:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.