FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
- URL: http://arxiv.org/abs/2108.05080v2
- Date: Thu, 12 Aug 2021 03:26:20 GMT
- Title: FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
- Authors: Hasam Khalid and Shahroz Tariq and Simon S. Woo
- Abstract summary: Recently, a new problem of generating cloned or synthesized human voice of a person is emerging.
With the emerging threat of impersonation attacks using deepfake videos and audios, new deepfake detectors are need that focuses on both, video and audio.
We propose a novel Audio-Video Deepfake dataset (FakeAVCeleb) that not only contains deepfake videos but respective synthesized cloned audios as well.
- Score: 21.199288324085444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the significant advancements made in generation of forged video and
audio, commonly known as deepfakes, using deep learning technologies, the
problem of its misuse is a well-known issue now. Recently, a new problem of
generating cloned or synthesized human voice of a person is emerging. AI-based
deep learning models can synthesize any person's voice requiring just a few
seconds of audio. With the emerging threat of impersonation attacks using
deepfake videos and audios, new deepfake detectors are need that focuses on
both, video and audio. Detecting deepfakes is a challenging task and
researchers have made numerous attempts and proposed several deepfake detection
methods. To develop a good deepfake detector, a handsome amount of good quality
dataset is needed that captures the real world scenarios. Many researchers have
contributed in this cause and provided several deepfake dataset, self generated
and in-the-wild. However, almost all of these datasets either contains deepfake
videos or audio. Moreover, the recent deepfake datasets proposed by researchers
have racial bias issues. Hence, there is a crucial need of a good deepfake
video and audio deepfake dataset. To fill this gap, we propose a novel
Audio-Video Deepfake dataset (FakeAVCeleb) that not only contains deepfake
videos but respective synthesized cloned audios as well. We generated our
dataset using recent most popular deepfake generation methods and the videos
and audios are perfectly lip-synced with each other. To generate a more
realistic dataset, we selected real YouTube videos of celebrities having four
racial backgrounds (Caucasian, Black, East Asian and South Asian) to counter
the racial bias issue. Lastly, we propose a novel multimodal detection method
that detects deepfake videos and audios based on our multimodal Audio-Video
deepfake dataset.
Related papers
- Hindi audio-video-Deepfake (HAV-DF): A Hindi language-based Audio-video Deepfake Dataset [11.164272928464879]
Fake videos or speeches in Hindi can have an enormous impact on rural and semi-urban communities.
This paper aims to create a first novel Hindi deep fake dataset, named Hindi audio-video-Deepfake'' (HAV-DF)
arXiv Detail & Related papers (2024-11-23T05:18:43Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - DeePhy: On Deepfake Phylogeny [58.01631614114075]
DeePhy is a novel Deepfake Phylogeny dataset which consists of 5040 deepfake videos generated using three different generation techniques.
We present the benchmark on DeePhy dataset using six deepfake detection algorithms.
arXiv Detail & Related papers (2022-09-19T15:30:33Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey [0.0]
This paper critically analyzes and provides a unique source of audio deepfake research, mostly ranging from 2016 to 2020.
This survey provides readers with a summary of 1) different deepfake categories 2) how they could be created and detected 3) the most recent trends in this domain and shortcomings in detection methods.
We found that Generative Adversarial Networks(GAN), Convolutional Neural Networks (CNN), and Deep Neural Networks (DNN) are common ways of creating and detecting deepfakes.
arXiv Detail & Related papers (2021-11-28T18:28:30Z) - Evaluation of an Audio-Video Multimodal Deepfake Dataset using Unimodal
and Multimodal Detectors [18.862258543488355]
Deepfakes can cause security and privacy issues.
New domain of cloning human voices using deep-learning technologies is also emerging.
To develop a good deepfake detector, we need a detector that can detect deepfakes of multiple modalities.
arXiv Detail & Related papers (2021-09-07T11:00:20Z) - Half-Truth: A Partially Fake Audio Detection Dataset [60.08010668752466]
This paper develops a dataset for half-truth audio detection (HAD)
Partially fake audio in the HAD dataset involves only changing a few words in an utterance.
We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset.
arXiv Detail & Related papers (2021-04-08T08:57:13Z) - WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection [82.42495493102805]
We introduce a new dataset WildDeepfake which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet.
We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically.
arXiv Detail & Related papers (2021-01-05T11:10:32Z) - Deepfake detection: humans vs. machines [4.485016243130348]
We present a subjective study conducted in a crowdsourcing-like scenario, which systematically evaluates how hard it is for humans to see if the video is deepfake or not.
For each video, a simple question: "Is face of the person in the video real of fake?" was answered on average by 19 na"ive subjects.
The evaluation demonstrates that while the human perception is very different from the perception of a machine, both successfully but in different ways are fooled by deepfakes.
arXiv Detail & Related papers (2020-09-07T15:20:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.