Related papers: PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

URL: http://arxiv.org/abs/2402.18085v4
Date: Mon, 26 May 2025 14:21:47 GMT
Title: PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Authors: Govind Mittal, Arthur Jakobsson, Kelly O. Marshall, Chinmay Hegde, Nasir Memon,
Abstract summary: We develop PITCH, a robust challenge-response method to detect and tag interactive deepfake audio calls.<n>PITCH's challenges enhanced machine detection capabilities to 88.7% AUROC score.<n>We develop a novel human-AI collaborative system that tags suspicious calls as "Deepfake-likely"
Score: 14.604998731837595
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The rise of AI voice-cloning technology, particularly audio Real-time Deepfakes (RTDFs), has intensified social engineering attacks by enabling real-time voice impersonation that bypasses conventional enrollment-based authentication. This technology represents an existential threat to phone-based authentication systems, while total identity fraud losses reached $43 billion. Unlike traditional robocalls, these personalized AI-generated voice attacks target high-value accounts and circumvent existing defensive measures, creating an urgent cybersecurity challenge. To address this, we propose PITCH, a robust challenge-response method to detect and tag interactive deepfake audio calls. We developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors, yielding 20 prospective challenges. Testing against leading voice-cloning systems using a novel dataset (18,600 original and 1.6 million deepfake samples from 100 users), PITCH's challenges enhanced machine detection capabilities to 88.7% AUROC score, enabling us to identify 10 highly-effective challenges. For human evaluation, we filtered a challenging, balanced subset on which human evaluators independently achieved 72.6% accuracy, while machines scored 87.7%. Recognizing that call environments require human control, we developed a novel human-AI collaborative system that tags suspicious calls as "Deepfake-likely." Contrary to prior findings, we discovered that integrating human intuition with machine precision offers complementary advantages, giving users maximum control while boosting detection accuracy to 84.5%. This significant improvement situates PITCH's potential as an AI-assisted pre-screener for verifying calls, offering an adaptable approach to combat real-time voice-cloning attacks while maintaining human decision authority.

Related papers

Moravec's Paradox: Towards an Auditory Turing Test [0.0]
This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly.<n>We introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions.<n>Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%.
arXiv Detail & Related papers (2025-07-30T20:45:13Z)
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z)
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition [95.95622220065884]
The MISP 2025 Challenge focuses on multi-modal, multi-device meeting transcription by incorporating video modality alongside audio.<n>The best-performing systems achieved significant improvements over the baseline.
arXiv Detail & Related papers (2025-05-20T06:11:51Z)
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space [7.504214864070018]
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients.<n> noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks.<n>This study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions.
arXiv Detail & Related papers (2025-05-16T15:31:40Z)
End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation [8.11594945165255]
We propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms. Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing.
arXiv Detail & Related papers (2025-04-29T16:38:23Z)
Advanced Real-Time Fraud Detection Using RAG-Based LLMs [0.990597034655156]
We introduce a novel real time fraud detection mechanism using Retrieval Augmented Generation technology. Key innovation of our system is the ability to update policies without retraining the entire model. This robust and flexible fraud detection system is well suited for real world deployment.
arXiv Detail & Related papers (2025-01-25T17:58:05Z)
I Can Hear You: Selective Robust Training for Deepfake Audio Detection [16.52185019459127]
We establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset. We propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components.
arXiv Detail & Related papers (2024-10-31T18:21:36Z)
A Recurrent Neural Network Approach to the Answering Machine Detection Problem [0.0]
This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved.
arXiv Detail & Related papers (2024-10-07T21:28:09Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024 [8.940008511570207]
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER) The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices. The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task.
arXiv Detail & Related papers (2024-09-03T21:28:45Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Acoustic Cybersecurity: Exploiting Voice-Activated Systems [0.0]
Our research extends the feasibility of these attacks across various platforms like Amazon's Alexa, Android, iOS, and Cortana. We quantitatively show that attack success rates hover around 60%, with the ability to activate devices remotely from over 100 feet away. These attacks threaten critical infrastructure, emphasizing the need for multifaceted defensive strategies.
arXiv Detail & Related papers (2023-11-23T02:26:11Z)
GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response [17.117162678626418]
We propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection.
arXiv Detail & Related papers (2022-10-12T13:15:54Z)
Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression. This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV) We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples. Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z)
Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not. Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z)
Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security. In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.