WaveFake: A Data Set to Facilitate Audio Deepfake Detection
- URL: http://arxiv.org/abs/2111.02813v1
- Date: Thu, 4 Nov 2021 12:26:34 GMT
- Title: WaveFake: A Data Set to Facilitate Audio Deepfake Detection
- Authors: Joel Frank, Lea Sch\"onherr
- Abstract summary: This paper provides an introduction to signal processing techniques used for analyzing audio signals.
Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages.
Third, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
- Score: 3.8073142980733
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep generative modeling has the potential to cause significant harm to
society. Recognizing this threat, a magnitude of research into detecting
so-called "Deepfakes" has emerged. This research most often focuses on the
image domain, while studies exploring generated audio signals have, so-far,
been neglected. In this paper we make three key contributions to narrow this
gap. First, we provide researchers with an introduction to common signal
processing techniques used for analyzing audio signals. Second, we present a
novel data set, for which we collected nine sample sets from five different
network architectures, spanning two languages. Finally, we supply practitioners
with two baseline models, adopted from the signal processing community, to
facilitate further research in this area.
Related papers
- Analyzing the Impact of Splicing Artifacts in Partially Fake Speech Signals [15.595136769477614]
We analyze spliced audio tracks resulting from signal concatenation, investigate their artifacts and assess whether such artifacts introduce any bias in existing datasets.
Our findings reveal that by analyzing splicing artifacts, we can achieve a detection EER of 6.16% and 7.36% on PartialSpoof and HAD datasets, respectively.
arXiv Detail & Related papers (2024-08-25T09:28:04Z) - Towards generalizing deep-audio fake detection networks [1.0128808054306186]
generative neural networks allow the creation of high-quality synthetic speech at scale.
We study the frequency domain fingerprints of current audio generators.
We train excellent lightweight detectors that generalize.
arXiv Detail & Related papers (2023-05-22T13:37:52Z) - System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation [51.06875680387692]
We present the first deepfake audio dataset for system fingerprint recognition (SFR)
We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies.
arXiv Detail & Related papers (2022-08-21T05:15:40Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - On the Frequency Bias of Generative Models [61.60834513380388]
We analyze proposed measures against high-frequency artifacts in state-of-the-art GAN training.
We find that none of the existing approaches can fully resolve spectral artifacts yet.
Our results suggest that there is great potential in improving the discriminator.
arXiv Detail & Related papers (2021-11-03T18:12:11Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - Deep Learning Radio Frequency Signal Classification with Hybrid Images [0.0]
We focus on the different pre-processing steps that can be used on the input training data, and test the results on a fixed Deep Learning architecture.
We propose a hybrid image that takes advantage of both time and frequency domain information, and tackles the classification as a Computer Vision problem.
arXiv Detail & Related papers (2021-05-19T11:12:09Z) - Multi-attentional Deepfake Detection [79.80308897734491]
Face forgery by deepfake is widely spread over the internet and has raised severe societal concerns.
We propose a new multi-attentional deepfake detection network. Specifically, it consists of three key components: 1) multiple spatial attention heads to make the network attend to different local parts; 2) textural feature enhancement block to zoom in the subtle artifacts in shallow features; 3) aggregate the low-level textural feature and high-level semantic features guided by the attention maps.
arXiv Detail & Related papers (2021-03-03T13:56:14Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Scattering Features for Multimodal Gait Recognition [5.3526997662068085]
We consider the problem of identifying people on the basis of their walk (gait) pattern.
We rely on acoustic and vibration measurements, obtained from a microphone and a geophone sensor, respectively.
arXiv Detail & Related papers (2020-01-23T22:11:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.