Related papers: Deepfake audio detection by speaker verification

Deepfake audio detection by speaker verification

URL: http://arxiv.org/abs/2209.14098v1
Date: Wed, 28 Sep 2022 13:46:29 GMT
Title: Deepfake audio detection by speaker verification
Authors: Alessandro Pianese and Davide Cozzolino and Giovanni Poggi and Luisa Verdoliva
Abstract summary: We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
Score: 79.99653758293277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes them unfit to face real-world scenarios. In this work, we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.

Related papers

Measuring the Robustness of Audio Deepfake Detectors [59.09338266364506]
This work systematically evaluates the robustness of 10 audio deepfake detection models against 16 common corruptions. Using both traditional deep learning models and state-of-the-art foundation models, we make four unique observations.
arXiv Detail & Related papers (2025-03-21T23:21:17Z)
Pitch Imperfect: Detecting Audio Deepfakes Through Acoustic Prosodic Analysis [6.858439600092057]
We explore the use of prosody, or the high-level linguistic features of human speech, as a more foundational means of detecting audio deepfakes. We develop a detector based on six classical prosodic features and demonstrate that our model performs as well as other baseline models. We show that we can explain the prosodic features that have highest impact on the model's decision.
arXiv Detail & Related papers (2025-02-20T16:52:55Z)
Detecting the Undetectable: Assessing the Efficacy of Current Spoof Detection Methods Against Seamless Speech Edits [82.8859060022651]
We introduce the Speech INfilling Edit (SINE) dataset, created with Voicebox. Subjective evaluations confirm that speech edited using this novel technique is more challenging to detect than conventional cut-and-paste methods. Despite human difficulty, experimental results demonstrate that self-supervised-based detectors can achieve remarkable performance in detection, localization, and generalization.
arXiv Detail & Related papers (2025-01-07T14:17:47Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Targeted Augmented Data for Audio Deepfake Detection [11.671275975119089]
We propose a novel augmentation method for generating audio pseudo-fakes targeting the decision boundary of the model. Inspired by adversarial attacks, we perturb original real data to synthesize pseudo-fakes with ambiguous prediction probabilities.
arXiv Detail & Related papers (2024-07-10T12:31:53Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification [73.30974350776636]
This paper comprehensively compares mainstream purification techniques in a unified framework. We propose an easy-to-follow ensemble approach that integrates advanced purification modules for detection.
arXiv Detail & Related papers (2023-12-14T03:04:05Z)
All-for-One and One-For-All: Deep learning-based feature fusion for Synthetic Speech Detection [18.429817510387473]
Recent advances in deep learning and computer vision have made the synthesis and counterfeiting of multimedia content more accessible than ever. In this paper, we consider three different feature sets proposed in the literature for the synthetic speech detection task and present a model that fuses them. The system was tested on different scenarios and datasets to prove its robustness to anti-forensic attacks and its generalization capabilities.
arXiv Detail & Related papers (2023-07-28T13:50:25Z)
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection [15.884911752869437]
We present a novel approach for synthetic speech detection, exploiting the combination of two high-level semantic properties of the human voice. On one side, we focus on speaker identity cues and represent them as speaker embeddings extracted using a state-of-the-art method for the automatic speaker verification task. On the other side, voice prosody, intended as variations in rhythm, pitch or accent in speech, is extracted through a specialized encoder.
arXiv Detail & Related papers (2022-10-31T11:03:03Z)
Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity. Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z)
Partially Fake Audio Detection by Self-attention-based Fake Span Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios. Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.