Deepfake audio detection by speaker verification
- URL: http://arxiv.org/abs/2209.14098v1
- Date: Wed, 28 Sep 2022 13:46:29 GMT
- Title: Deepfake audio detection by speaker verification
- Authors: Alessandro Pianese and Davide Cozzolino and Giovanni Poggi and Luisa
Verdoliva
- Abstract summary: We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
- Score: 79.99653758293277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thanks to recent advances in deep learning, sophisticated generation tools
exist, nowadays, that produce extremely realistic synthetic speech. However,
malicious uses of such tools are possible and likely, posing a serious threat
to our society. Hence, synthetic voice detection has become a pressing research
topic, and a large variety of detection methods have been recently proposed.
Unfortunately, they hardly generalize to synthetic audios generated by tools
never seen in the training phase, which makes them unfit to face real-world
scenarios. In this work, we aim at overcoming this issue by proposing a new
detection approach that leverages only the biometric characteristics of the
speaker, with no reference to specific manipulations. Since the detector is
trained only on real data, generalization is automatically ensured. The
proposed approach can be implemented based on off-the-shelf speaker
verification tools. We test several such solutions on three popular test sets,
obtaining good performance, high generalization ability, and high robustness to
audio impairment.
Related papers
- Pitch Imperfect: Detecting Audio Deepfakes Through Acoustic Prosodic Analysis [6.858439600092057]
We explore the use of prosody, or the high-level linguistic features of human speech, as a more foundational means of detecting audio deepfakes.
We develop a detector based on six classical prosodic features and demonstrate that our model performs as well as other baseline models.
We show that we can explain the prosodic features that have highest impact on the model's decision.
arXiv Detail & Related papers (2025-02-20T16:52:55Z) - Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox.
Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods.
Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Targeted Augmented Data for Audio Deepfake Detection [11.671275975119089]
We propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model.
Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities.
arXiv Detail & Related papers (2024-07-10T12:31:53Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - All-for-One and One-For-All: Deep learning-based feature fusion for
Synthetic Speech Detection [18.429817510387473]
Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever.
In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them.
The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities.
arXiv Detail & Related papers (2023-07-28T13:50:25Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.