Betray Oneself: A Novel Audio DeepFake Detection Model via
Mono-to-Stereo Conversion
- URL: http://arxiv.org/abs/2305.16353v1
- Date: Thu, 25 May 2023 02:54:29 GMT
- Title: Betray Oneself: A Novel Audio DeepFake Detection Model via
Mono-to-Stereo Conversion
- Authors: Rui Liu, Jinhua Zhang, Guanglai Gao and Haizhou Li
- Abstract summary: Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc.
We propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process.
- Score: 70.99781219121803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Audio Deepfake Detection (ADD) aims to detect the fake audio generated by
text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an
emerging topic. Traditionally we take the mono signal as input and focus on
robust feature extraction and effective classifier design. However, the
dual-channel stereo information in the audio signal also includes important
cues for deepfake, which has not been studied in the prior work. In this paper,
we propose a novel ADD model, termed as M2S-ADD, that attempts to discover
audio authenticity cues during the mono-to-stereo conversion process. We first
projects the mono to a stereo signal using a pretrained stereo synthesizer,
then employs a dual-branch neural architecture to process the left and right
channel signals, respectively. In this way, we effectively reveal the artifacts
in the fake audio, thus improve the ADD performance. The experiments on the
ASVspoof2019 database show that M2S-ADD outperforms all baselines that input
mono. We release the source code at \url{https://github.com/AI-S2-Lab/M2S-ADD}.
Related papers
- Gotta Hear Them All: Sound Source Aware Vision to Audio Generation [13.55717701044619]
Vision-to-audio (V2A) has broad applications in multimedia.
We propose a Sound Source-Aware V2A (SSV2A) generator.
We show that SSV2A surpasses state-of-the-art methods in both generation fidelity and relevance.
arXiv Detail & Related papers (2024-11-23T04:27:19Z) - Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio [40.21394391724075]
Large Language Model (LLM) based deepfake audio is an urgent need for effective detection methods.
We propose Codecfake, which is generated by seven representative neural methods.
Experiment results show that neural-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models.
arXiv Detail & Related papers (2024-06-12T11:47:23Z) - The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio [42.84634652376024]
ALM-based deepfake audio exhibits widespread, high deception, and type versatility.
To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method.
We propose the CSAM strategy to learn a domain balanced and generalized minima.
arXiv Detail & Related papers (2024-05-08T08:28:40Z) - Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
Models [65.18102159618631]
multimodal generative modeling has created milestones in text-to-image and text-to-video generation.
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
We propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps.
arXiv Detail & Related papers (2023-01-30T04:44:34Z) - MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and
Video Generation [70.74377373885645]
We propose the first joint audio-video generation framework that brings engaging watching and listening experiences simultaneously.
MM-Diffusion consists of a sequential multi-modal U-Net for a joint denoising process by design.
Experiments show superior results in unconditional audio-video generation, and zero-shot conditional tasks.
arXiv Detail & Related papers (2022-12-19T14:11:52Z) - ADD 2022: the First Audio Deep Synthesis Detection Challenge [92.41777858637556]
The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap.
The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG)
arXiv Detail & Related papers (2022-02-17T03:29:20Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z) - Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
Source Separation [96.18178553315472]
We propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio.
We integrate both stereo generation and source separation into a unified framework, Sep-Stereo.
arXiv Detail & Related papers (2020-07-20T06:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.