Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
- URL: http://arxiv.org/abs/2408.10853v1
- Date: Tue, 20 Aug 2024 13:45:34 GMT
- Title: Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
- Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye,
- Abstract summary: Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs.
This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio.
Our findings reveal that the latest-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions.
- Score: 40.38305757279412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.
Related papers
- SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within.
Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples.
In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z) - Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio [40.21394391724075]
Large Language Model (LLM) based deepfake audio is an urgent need for effective detection methods.
We propose Codecfake, which is generated by seven representative neural methods.
Experiment results show that neural-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models.
arXiv Detail & Related papers (2024-06-12T11:47:23Z) - The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio [42.84634652376024]
ALM-based deepfake audio exhibits widespread, high deception, and type versatility.
To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method.
We propose the CSAM strategy to learn a domain balanced and generalized minima.
arXiv Detail & Related papers (2024-05-08T08:28:40Z) - An RFP dataset for Real, Fake, and Partially fake audio detection [0.36832029288386137]
The paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real.
The data are then used to evaluate several detection models, revealing that the available models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio.
arXiv Detail & Related papers (2024-04-26T23:00:56Z) - Betray Oneself: A Novel Audio DeepFake Detection Model via
Mono-to-Stereo Conversion [70.99781219121803]
Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc.
We propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process.
arXiv Detail & Related papers (2023-05-25T02:54:29Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Does Audio Deepfake Detection Generalize? [6.415366195115544]
We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work.
We publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes.
This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought.
arXiv Detail & Related papers (2022-03-30T12:48:22Z) - Active Audio-Visual Separation of Dynamic Sound Sources [93.97385339354318]
We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone.
We show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target.
arXiv Detail & Related papers (2022-02-02T02:03:28Z) - Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD)
Partially fake audio in the HAD dataset involves only changing a few words in an utterance.
We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.