An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio
- URL: http://arxiv.org/abs/2208.09646v1
- Date: Sat, 20 Aug 2022 09:23:21 GMT
- Title: An Initial Investigation for Detecting Vocoder Fingerprints of Fake
Audio
- Authors: Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao
Wang, Shiming Wang, Ruibo Fu
- Abstract summary: We propose a new problem for detecting vocoder fingerprints of fake audio.
Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders.
- Score: 53.134423013599914
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many effective attempts have been made for fake audio detection. However,
they can only provide detection results but no countermeasures to curb this
harm. For many related practical applications, what model or algorithm
generated the fake audio also is needed. Therefore, We propose a new problem
for detecting vocoder fingerprints of fake audio. Experiments are conducted on
the datasets synthesized by eight state-of-the-art vocoders. We have
preliminarily explored the features and model architectures. The t-SNE
visualization shows that different vocoders generate distinct vocoder
fingerprints.
Related papers
- Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio [40.21394391724075]
Large Language Model (LLM) based deepfake audio is an urgent need for effective detection methods.
We propose Codecfake, which is generated by seven representative neural methods.
Experiment results show that neural-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models.
arXiv Detail & Related papers (2024-06-12T11:47:23Z) - An RFP dataset for Real, Fake, and Partially fake audio detection [0.36832029288386137]
The paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real.
The data are then used to evaluate several detection models, revealing that the available models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio.
arXiv Detail & Related papers (2024-04-26T23:00:56Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - Diffsound: Discrete Diffusion Model for Text-to-sound Generation [78.4128796899781]
We propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
The framework first uses the decoder to transfer the text features extracted from the text encoder to a mel-spectrogram with the help of VQ-VAE, and then the vocoder is used to transform the generated mel-spectrogram into a waveform.
arXiv Detail & Related papers (2022-07-20T15:41:47Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Partially Fake Audio Detection by Self-attention-based Fake Span
Discovery [89.21979663248007]
We propose a novel framework by introducing the question-answering (fake span discovery) strategy with the self-attention mechanism to detect partially fake audios.
Our submission ranked second in the partially fake audio detection track of ADD 2022.
arXiv Detail & Related papers (2022-02-14T13:20:55Z) - Audio Defect Detection in Music with Deep Networks [8.680081568962997]
Deliberate use of artefacts such as clicks in popular music calls for data-centric and context sensitive solutions for detection.
We present a convolutional network architecture following end-to-end encoder decoder configuration to develop detectors for two exemplary audio defects.
arXiv Detail & Related papers (2022-02-11T15:56:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.