System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation
- URL: http://arxiv.org/abs/2208.10489v3
- Date: Fri, 15 Sep 2023 07:19:46 GMT
- Title: System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation
- Authors: Xinrui Yan, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Junzuo Zhou, Hao
Gu, Ruibo Fu
- Abstract summary: We present the first deepfake audio dataset for system fingerprint recognition (SFR)
We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies.
- Score: 51.06875680387692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of deep speech synthesis models has posed significant
threats to society such as malicious content manipulation. Therefore, many
studies have emerged to detect the so-called deepfake audio. However, existing
works focus on the binary detection of real audio and fake audio. In real-world
scenarios such as model copyright protection and digital evidence forensics, it
is needed to know what tool or model generated the deepfake audio to explain
the decision. This motivates us to ask: Can we recognize the system
fingerprints of deepfake audio? In this paper, we present the first deepfake
audio dataset for system fingerprint recognition (SFR) and conduct an initial
investigation. We collected the dataset from the speech synthesis systems of
seven Chinese vendors that use the latest state-of-the-art deep learning
technologies, including both clean and compressed sets. In addition, to
facilitate the further development of system fingerprint recognition methods,
we provide extensive benchmarks that can be compared and research findings. The
dataset will be publicly available. .
Related papers
- Vulnerability of Automatic Identity Recognition to Audio-Visual
Deepfakes [13.042731289687918]
We present the first realistic audio-visual database of deepfakes SWAN-DF, where lips and speech are well synchronized.
We demonstrate the vulnerability of a state of the art speaker recognition system, such as ECAPA-TDNN-based model from SpeechBrain.
arXiv Detail & Related papers (2023-11-29T14:18:04Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio [53.134423013599914]
We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
arXiv Detail & Related papers (2022-08-20T09:23:21Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z) - WaveFake: A Data Set to Facilitate Audio Deepfake Detection [3.8073142980733]
This paper provides an introduction to signal processing techniques used for analyzing audio signals.
Second, we present a novel data set, for which we collected nine sample sets from five different network architectures, spanning two languages.
Third, we supply practitioners with two baseline models, adopted from the signal processing community, to facilitate further research in this area.
arXiv Detail & Related papers (2021-11-04T12:26:34Z) - Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal
and Multimodal Detectors [18.862258543488355]
Deepfakes can cause security and privacy issues.
New domain of cloning human voices using deep-learning technologies is also emerging.
To develop a good deepfake detector, we need a detector that can detect deepfakes of multiple modalities.
arXiv Detail & Related papers (2021-09-07T11:00:20Z) - Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using
Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content.
We extract and analyze the similarity between the two audio and visual modalities from within the same video.
We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z) - SynFi: Automatic Synthetic Fingerprint Generation [23.334625222079634]
We introduce a new approach to automatically generate high-fidelity synthetic fingerprints at scale.
We show that our methodology is the first to generate fingerprints that are computationally indistinguishable from real ones.
arXiv Detail & Related papers (2020-02-16T07:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.