Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
- URL: http://arxiv.org/abs/2406.08112v1
- Date: Wed, 12 Jun 2024 11:47:23 GMT
- Title: Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
- Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi,
- Abstract summary: Large Language Model (LLM) based deepfake audio is an urgent need for effective detection methods.
We propose Codecfake, which is generated by seven representative neural methods.
Experiment results show that neural-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models.
- Score: 40.21394391724075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skipping the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set.
Related papers
- Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio? [40.38305757279412]
Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs.
This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio.
Our findings reveal that the latest-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions.
arXiv Detail & Related papers (2024-08-20T13:45:34Z) - Neural Speech and Audio Coding [19.437080345021105]
The paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems.
It introduces a neural network-based signal enhancer designed to post-process existing codecs' output.
The paper examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs.
arXiv Detail & Related papers (2024-08-13T15:13:21Z) - The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio [42.84634652376024]
ALM-based deepfake audio exhibits widespread, high deception, and type versatility.
To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method.
We propose the CSAM strategy to learn a domain balanced and generalized minima.
arXiv Detail & Related papers (2024-05-08T08:28:40Z) - Betray Oneself: A Novel Audio DeepFake Detection Model via
Mono-to-Stereo Conversion [70.99781219121803]
Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc.
We propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process.
arXiv Detail & Related papers (2023-05-25T02:54:29Z) - An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio [53.134423013599914]
We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
arXiv Detail & Related papers (2022-08-20T09:23:21Z) - Diffsound: Discrete Diffusion Model for Text-to-sound Generation [78.4128796899781]
We propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
The framework first uses the decoder to transfer the text features extracted from the text encoder to a mel-spectrogram with the help of VQ-VAE, and then the vocoder is used to transform the generated mel-spectrogram into a waveform.
arXiv Detail & Related papers (2022-07-20T15:41:47Z) - Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired
Speech Data [145.95460945321253]
We introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes.
The proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training.
arXiv Detail & Related papers (2022-03-31T15:33:56Z) - DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding [71.73405116189531]
We propose a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulates those defined in conventional vocoders.
As the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.
arXiv Detail & Related papers (2021-10-13T01:39:57Z) - Audio Dequantization for High Fidelity Audio Generation in Flow-based
Neural Vocoder [29.63675159839434]
Flow-based neural vocoder has shown significant improvement in real-time speech generation task.
We propose audio dequantization methods in flow-based neural vocoder for high fidelity audio generation.
arXiv Detail & Related papers (2020-08-16T09:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.