CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
- URL: http://arxiv.org/abs/2404.15854v1
- Date: Wed, 24 Apr 2024 13:10:35 GMT
- Title: CLAD: Robust Audio Deepfake Detection Against Manipulation Attacks with Contrastive Learning
- Authors: Haolin Wu, Jing Chen, Ruiying Du, Cong Wu, Kun He, Xingcan Shang, Hao Ren, Guowen Xu,
- Abstract summary: We study the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks.
Even manipulations like volume control can significantly bypass detection without affecting human perception.
We propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks.
- Score: 20.625160354407974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing prevalence of audio deepfakes poses significant security threats, necessitating robust detection methods. While existing detection systems exhibit promise, their robustness against malicious audio manipulations remains underexplored. To bridge the gap, we undertake the first comprehensive study of the susceptibility of the most widely adopted audio deepfake detectors to manipulation attacks. Surprisingly, even manipulations like volume control can significantly bypass detection without affecting human perception. To address this, we propose CLAD (Contrastive Learning-based Audio deepfake Detector) to enhance the robustness against manipulation attacks. The key idea is to incorporate contrastive learning to minimize the variations introduced by manipulations, therefore enhancing detection robustness. Additionally, we incorporate a length loss, aiming to improve the detection accuracy by clustering real audios more closely in the feature space. We comprehensively evaluated the most widely adopted audio deepfake detection models and our proposed CLAD against various manipulation attacks. The detection models exhibited vulnerabilities, with FAR rising to 36.69%, 31.23%, and 51.28% under volume control, fading, and noise injection, respectively. CLAD enhanced robustness, reducing the FAR to 0.81% under noise injection and consistently maintaining an FAR below 1.63% across all tests. Our source code and documentation are available in the artifact repository (https://github.com/CLAD23/CLAD).
Related papers
- I Can Hear You: Selective Robust Training for Deepfake Audio Detection [16.52185019459127]
We establish the largest public voice dataset to date, named DeepFakeVox-HQ, comprising 1.3 million samples.
Despite previously reported high accuracy, existing deepfake voice detectors struggle with our diversely collected dataset.
We propose the F-SAT: Frequency-Selective Adversarial Training method focusing on high-frequency components.
arXiv Detail & Related papers (2024-10-31T18:21:36Z) - A Two-Stage Dual-Path Framework for Text Tampering Detection and
Recognition [12.639006068141528]
Before the advent of deep learning, document tamper detection was difficult.
We have made some explorations in the field of text tamper detection based on deep learning.
Our Ps tamper detection method includes three steps: feature assistance, audit point positioning, and tamper recognition.
arXiv Detail & Related papers (2024-02-21T05:54:42Z) - What to Remember: Self-Adaptive Continual Learning for Audio Deepfake
Detection [53.063161380423715]
Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types.
We propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection.
arXiv Detail & Related papers (2023-12-15T09:52:17Z) - Scalable Ensemble-based Detection Method against Adversarial Attacks for
speaker verification [73.30974350776636]
This paper comprehensively compares mainstream purification techniques in a unified framework.
We propose an easy-to-follow ensemble approach that integrates advanced purification modules for detection.
arXiv Detail & Related papers (2023-12-14T03:04:05Z) - Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual
Active Speaker Detection [88.74863771919445]
We reveal the vulnerability of AVASD models under audio-only, visual-only, and audio-visual adversarial attacks.
We also propose a novel audio-visual interaction loss (AVIL) for making attackers difficult to find feasible adversarial examples.
arXiv Detail & Related papers (2022-10-03T08:10:12Z) - UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of
Perturbed Samples [1.2691047660244335]
Botnet detection requires extremely low false-positive rates (FPR), which are not commonly attainable in contemporary deep learning.
In this paper, two LSTM-based classification algorithms for botnet classification with an accuracy higher than 98% are presented.
arXiv Detail & Related papers (2022-04-18T21:49:14Z) - Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Realtime Robust Malicious Traffic Detection via Frequency Domain
Analysis [14.211671196458477]
We propose Whisper, a realtime ML based malicious traffic detection system that achieves both high accuracy and high throughput.
Our experiments with 42 types of attacks demonstrate that Whisper can accurately detect various sophisticated and stealthy attacks, achieving at most 18.36% improvement.
Even under various evasion attacks, Whisper is still able to maintain around 90% detection accuracy.
arXiv Detail & Related papers (2021-06-28T13:38:05Z) - Robust and Accurate Object Detection via Adversarial Learning [111.36192453882195]
This work augments the fine-tuning stage for object detectors by exploring adversarial examples.
Our approach boosts the performance of state-of-the-art EfficientDets by +1.1 mAP on the object detection benchmark.
arXiv Detail & Related papers (2021-03-23T19:45:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.