Related papers: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response

URL: http://arxiv.org/abs/2402.18085v1
Date: Wed, 28 Feb 2024 06:17:55 GMT
Title: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Authors: Govind Mittal, Arthur Jakobsson, Kelly O. Marshall, Chinmay Hegde, Nasir Memon,
Abstract summary: Scammers are aggressively leveraging AI voice-cloning technology for social engineering attacks. Real-time Deepfakes (RTDFs) can clone a target's voice in real-time over phone calls. We introduce a robust challenge-response-based method to detect deepfake audio calls.
Score: 14.604998731837595
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Scammers are aggressively leveraging AI voice-cloning technology for social engineering attacks, a situation significantly worsened by the advent of audio Real-time Deepfakes (RTDFs). RTDFs can clone a target's voice in real-time over phone calls, making these interactions highly interactive and thus far more convincing. Our research confidently addresses the gap in the existing literature on deepfake detection, which has largely been ineffective against RTDF threats. We introduce a robust challenge-response-based method to detect deepfake audio calls, pioneering a comprehensive taxonomy of audio challenges. Our evaluation pitches 20 prospective challenges against a leading voice-cloning system. We have compiled a novel open-source challenge dataset with contributions from 100 smartphone and desktop users, yielding 18,600 original and 1.6 million deepfake samples. Through rigorous machine and human evaluations of this dataset, we achieved a deepfake detection rate of 86% and an 80% AUC score, respectively. Notably, utilizing a set of 11 challenges significantly enhances detection capabilities. Our findings reveal that combining human intuition with machine precision offers complementary advantages. Consequently, we have developed an innovative human-AI collaborative system that melds human discernment with algorithmic accuracy, boosting final joint accuracy to 82.9%. This system highlights the significant advantage of AI-assisted pre-screening in call verification processes. Samples can be heard at https://mittalgovind.github.io/autch-samples/

Related papers

Moravec's Paradox: Towards an Auditory Turing Test [0.0]
This research work demonstrates that current AI systems fail catastrophically on auditory tasks that humans perform effortlessly.<n>We introduce an auditory Turing test comprising 917 challenges across seven categories: overlapping speech, speech in noise, temporal distortion, spatial audio, coffee-shop noise, phone distortion, and perceptual illusions.<n>Our evaluation of state-of-the-art audio models including GPT-4's audio capabilities and OpenAI's Whisper reveals a striking failure rate exceeding 93%.
arXiv Detail & Related papers (2025-07-30T20:45:13Z)
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z)
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition [95.95622220065884]
The MISP 2025 Challenge focuses on multi-modal, multi-device meeting transcription by incorporating video modality alongside audio.<n>The best-performing systems achieved significant improvements over the baseline.
arXiv Detail & Related papers (2025-05-20T06:11:51Z)
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space [7.504214864070018]
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients.<n> noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks.<n>This study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions.
arXiv Detail & Related papers (2025-05-16T15:31:40Z)
End-to-end Audio Deepfake Detection from RAW Waveforms: a RawNet-Based Approach with Cross-Dataset Evaluation [8.11594945165255]
We propose an end-to-end deep learning framework for audio deepfake detection that operates directly on raw waveforms. Our model, RawNetLite, is a lightweight convolutional-recurrent architecture designed to capture both spectral and temporal features without handcrafted preprocessing.
arXiv Detail & Related papers (2025-04-29T16:38:23Z)
Advanced Real-Time Fraud Detection Using RAG-Based LLMs [0.990597034655156]
We introduce a novel real time fraud detection mechanism using Retrieval Augmented Generation technology. Key innovation of our system is the ability to update policies without retraining the entire model. This robust and flexible fraud detection system is well suited for real world deployment.
arXiv Detail & Related papers (2025-01-25T17:58:05Z)
I Can Hear You: Selective Robust Training for Deepfake Audio Detection [16.52185019459127]
We establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples. Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset. We propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components.
arXiv Detail & Related papers (2024-10-31T18:21:36Z)
A Recurrent Neural Network Approach to the Answering Machine Detection Problem [0.0]
This paper presents an innovative approach to answering machine detection that leverages transfer learning through the YAMNet model for feature extraction. The results demonstrate an accuracy of over 96% on the test set. Furthermore, we conduct an in-depth analysis of misclassified samples and reveal that an accuracy exceeding 98% can be achieved.
arXiv Detail & Related papers (2024-10-07T21:28:09Z)
SONAR: A Synthetic AI-Audio Detection Framework and Benchmark [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark. It aims to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024 [8.940008511570207]
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER) The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices. The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task.
arXiv Detail & Related papers (2024-09-03T21:28:45Z)
Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z)
Acoustic Cybersecurity: Exploiting Voice-Activated Systems [0.0]
Our research extends the feasibility of these attacks across various platforms like Amazon's Alexa, Android, iOS, and Cortana. We quantitatively show that attack success rates hover around 60%, with the ability to activate devices remotely from over 100 feet away. These attacks threaten critical infrastructure, emphasizing the need for multifaceted defensive strategies.
arXiv Detail & Related papers (2023-11-23T02:26:11Z)
GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response [17.117162678626418]
We propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection.
arXiv Detail & Related papers (2022-10-12T13:15:54Z)
Deepfake audio detection by speaker verification [79.99653758293277]
We propose a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability, and high robustness to audio impairment.
arXiv Detail & Related papers (2022-09-28T13:46:29Z)
Exploring linguistic feature and model combination for speech recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques. Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems. This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z)
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection [62.23830810096617]
Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care to delay further progression. This paper presents the development of a state-of-the-art Conformer based speech recognition system built on the DementiaBank Pitt corpus for automatic AD detection.
arXiv Detail & Related papers (2022-06-23T12:50:55Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV) We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples. Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z)
Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural Networks [68.8204255655161]
We adapt an ensemble of Convolutional Neural Networks to classify if a speaker is infected with COVID-19 or not. Ultimately, it achieves an Unweighted Average Recall (UAR) of 74.9%, or an Area Under ROC Curve (AUC) of 80.7% by ensembling neural networks.
arXiv Detail & Related papers (2020-12-29T01:14:17Z)
Adversarial vs behavioural-based defensive AI with joint, continual and active learning: automated evaluation of robustness to deception, poisoning and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security. In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.