System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation
- URL: http://arxiv.org/abs/2208.10489v3
- Date: Fri, 15 Sep 2023 07:19:46 GMT
- Title: System Fingerprint Recognition for Deepfake Audio: An Initial Dataset
and Investigation
- Authors: Xinrui Yan, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Junzuo Zhou, Hao
Gu, Ruibo Fu
- Abstract summary: We present the first deepfake audio dataset for system fingerprint recognition (SFR)
We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies.
- Score: 51.06875680387692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of deep speech synthesis models has posed significant
threats to society such as malicious content manipulation. Therefore, many
studies have emerged to detect the so-called deepfake audio. However, existing
works focus on the binary detection of real audio and fake audio. In real-world
scenarios such as model copyright protection and digital evidence forensics, it
is needed to know what tool or model generated the deepfake audio to explain
the decision. This motivates us to ask: Can we recognize the system
fingerprints of deepfake audio? In this paper, we present the first deepfake
audio dataset for system fingerprint recognition (SFR) and conduct an initial
investigation. We collected the dataset from the speech synthesis systems of
seven Chinese vendors that use the latest state-of-the-art deep learning
technologies, including both clean and compressed sets. In addition, to
facilitate the further development of system fingerprint recognition methods,
we provide extensive benchmarks that can be compared and research findings. The
dataset will be publicly available. .
Related papers
- Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework [8.11594945165255]
The proliferation of audio deepfakes poses a growing threat to trust in digital communications.<n>We introduce LAVA, a hierarchical framework for audio deepfake detection and model recognition.<n>Two specialized classifiers operate on these features: Audio Deepfake Attribution (ADA), which identifies the generation technology, and Audio Deepfake Model Recognition (ADMR), which recognize the specific generative model instance.
arXiv Detail & Related papers (2025-08-04T15:31:13Z) - End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation [8.11594945165255]
We propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms.
Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing.
arXiv Detail & Related papers (2025-04-29T16:38:23Z) - Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning [3.453303606167197]
We show that two of the most widely used audio-video deepfake datasets suffer from a previously unidentified spurious feature: the leading silence.
Fake videos start with a very brief moment of silence and based on this feature alone, we can separate the real and fake samples almost perfectly.
We propose a shift from supervised to unsupervised learning by training models exclusively on real data.
arXiv Detail & Related papers (2024-11-29T18:58:20Z) - SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within.
Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples.
In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z) - Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection [17.285669984798975]
This paper addresses the challenge of developing a robust audio-visual deepfake detection model.
New generation algorithms are continually emerging, and these algorithms are not encountered during the development of detection methods.
We propose a multi-stream fusion approach with one-class learning as a representation-level regularization technique.
arXiv Detail & Related papers (2024-06-20T10:33:15Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Text-to-feature diffusion for audio-visual few-shot learning [59.45164042078649]
Few-shot learning from video data is a challenging and underexplored, yet much cheaper, setup.
We introduce a unified audio-visual few-shot video classification benchmark on three datasets.
We show that AV-DIFF obtains state-of-the-art performance on our proposed benchmark for audio-visual few-shot learning.
arXiv Detail & Related papers (2023-09-07T17:30:36Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Faked Speech Detection with Zero Prior Knowledge [2.407976495888858]
We introduce a neural network method to develop a classifier that will blindly classify an input audio as real or mimicked.
We propose a deep neural network following a sequential model that comprises three hidden layers, with alternating dense and drop out layers.
We were able to get at least 94% correct classification of the test cases, as against the 85% accuracy in the case of human observers.
arXiv Detail & Related papers (2022-09-26T10:38:39Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.