PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
- URL: http://arxiv.org/abs/2402.18085v3
- Date: Tue, 01 Oct 2024 16:54:49 GMT
- Title: PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
- Authors: Govind Mittal, Arthur Jakobsson, Kelly O. Marshall, Chinmay Hegde, Nasir Memon,
- Abstract summary: PITCH is a robust challenge-response method to detect and tag interactive deepfake audio calls.
We developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors.
Our solution gave users maximum control and boosted detection accuracy to 84.5%.
- Score: 14.604998731837595
- License:
- Abstract: The rise of AI voice-cloning technology, particularly audio Real-time Deepfakes (RTDFs), has intensified social engineering attacks by enabling real-time voice impersonation that bypasses conventional enrollment-based authentication. To address this, we propose PITCH, a robust challenge-response method to detect and tag interactive deepfake audio calls. We developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors, yielding 20 prospective challenges. These were tested against leading voice-cloning systems using a novel dataset comprising 18,600 original and 1.6 million deepfake samples from 100 users. PITCH's prospective challenges enhanced machine detection capabilities to 88.7% AUROC score on the full unbalanced dataset, enabling us to shortlist 10 functional challenges that balance security and usability. For human evaluation and subsequent analyses, we filtered a challenging, balanced subset. On this subset, human evaluators independently scored 72.6% accuracy, while machines achieved 87.7%. Acknowledging that call environments require higher human control, we aided call receivers in making decisions with them using machines. Our solution uses an early warning system to tag suspicious incoming calls as "Deepfake-likely." Contrary to prior findings, we discovered that integrating human intuition with machine precision offers complementary advantages. Our solution gave users maximum control and boosted detection accuracy to 84.5%. Evidenced by this jump in accuracy, PITCH demonstrated the potential for AI-assisted pre-screening in call verification processes, offering an adaptable and usable approach to combat real-time voice-cloning attacks. Code to reproduce and access data at \url{https://github.com/mittalgovind/PITCH-Deepfakes}.
Related papers
- I Can Hear You: Selective Robust Training for Deepfake Audio Detection [16.52185019459127]
We establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples.
Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset.
We propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components.
arXiv Detail & Related papers (2024-10-31T18:21:36Z) - A Recurrent Neural Network Approach to the Answering Machine Detection Problem [0.0]
This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction.
The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved.
arXiv Detail & Related papers (2024-10-07T21:28:09Z) - SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.
It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.
It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024 [8.940008511570207]
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER)
The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices.
The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task.
arXiv Detail & Related papers (2024-09-03T21:28:45Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response [17.117162678626418]
We propose a challenge-response approach that establishes authenticity in live settings.
We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines.
The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection.
arXiv Detail & Related papers (2022-10-12T13:15:54Z) - Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations.
The proposed approach can be implemented based on off-the-shelf speaker verification tools.
We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural
Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not.
Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.