Related papers: Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

URL: http://arxiv.org/abs/2408.10853v1
Date: Tue, 20 Aug 2024 13:45:34 GMT
Title: Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye,
Abstract summary: Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Our findings reveal that the latest-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions.
Score: 40.38305757279412
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.

Related papers

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? [65.49972312524724]
multimodal large language models (MLLMs) have expanded their capabilities to include vision and audio modalities. Our proposed DeafTest reveals that MLLMs often struggle with simple tasks humans find trivial. We introduce AV-Odyssey Bench, a comprehensive audio-visual benchmark designed to assess whether those MLLMs can truly understand the audio-visual information.
arXiv Detail & Related papers (2024-12-03T17:41:23Z)
SafeEar: Content Privacy-Preserving Audio Deepfake Detection [17.859275594843965]
We propose SafeEar, a novel framework that aims to detect deepfake audios without relying on accessing the speech content within. Our key idea is to devise a neural audio into a novel decoupling model that well separates the semantic and acoustic information from audio samples. In this way, no semantic content will be exposed to the detector.
arXiv Detail & Related papers (2024-09-14T02:45:09Z)
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio [40.21394391724075]
Large Language Model (LLM) based deepfake audio is an urgent need for effective detection methods. We propose Codecfake, which is generated by seven representative neural methods. Experiment results show that neural-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models.
arXiv Detail & Related papers (2024-06-12T11:47:23Z)
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio [42.84634652376024]
ALM-based deepfake audio exhibits widespread, high deception, and type versatility. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method. We propose the CSAM strategy to learn a domain balanced and generalized minima.
arXiv Detail & Related papers (2024-05-08T08:28:40Z)
An RFP dataset for Real, Fake, and Partially fake audio detection [0.36832029288386137]
The paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio.
arXiv Detail & Related papers (2024-04-26T23:00:56Z)
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion [70.99781219121803]
Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc. We propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process.
arXiv Detail & Related papers (2023-05-25T02:54:29Z)
SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake. A manipulated audio is generated by only tampering with the acoustic scene of an original audio. Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z)
Audio Deepfake Attribution: An Initial Dataset and Investigation [41.62487394875349]
We design the first deepfake audio dataset for the attribution of audio generation tools, called Audio Deepfake Attribution (ADA) We propose the Class- Multi-Center Learning ( CRML) method for open-set audio deepfake attribution (OSADA) Experimental results demonstrate that the CRML method effectively addresses open-set risks in real-world scenarios.
arXiv Detail & Related papers (2022-08-21T05:15:40Z)
Active Audio-Visual Separation of Dynamic Sound Sources [93.97385339354318]
We propose a reinforcement learning agent equipped with a novel transformer memory that learns motion policies to control its camera and microphone. We show that our model is able to learn efficient behavior to carry out continuous separation of a time-varying audio target.
arXiv Detail & Related papers (2022-02-02T02:03:28Z)
Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD) Partially fake audio in the HAD dataset involves only changing a few words in an utterance. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.