The DeepSpeak Dataset
- URL: http://arxiv.org/abs/2408.05366v4
- Date: Sat, 26 Jul 2025 19:52:33 GMT
- Title: The DeepSpeak Dataset
- Authors: Sarah Barrington, Matyas Bohacek, Hany Farid,
- Abstract summary: DeepSpeak is a diverse and multimodal dataset comprising over 100 hours of authentic and deepfake audiovisual content.<n>We contribute: i) more than 50 hours of real, self-recorded data collected from 500 diverse and consenting participants using a custom-built data collection tool; ii) more than 50 hours of state-of-the-art audio and visual deepfakes generated using 14 video synthesis engines and three voice cloning engines; andiii) an embedding-based, identity-matching approach to ensure the creation of convincing, high-quality identity swaps.
- Score: 11.661238776379115
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Deepfakes represent a growing concern across domains such as impostor hiring, fraud, and disinformation. Despite significant efforts to develop robust detection classifiers to distinguish the real from the fake, commonly used training datasets remain inadequate: relying on low-quality and outdated deepfake generators, consisting of content scraped from online repositories without participant consent, lacking in multimodal coverage, and rarely employing identity-matching protocols to ensure realistic fakes. To overcome these limitations, we present the DeepSpeak dataset, a diverse and multimodal dataset comprising over 100 hours of authentic and deepfake audiovisual content. We contribute: i) more than 50 hours of real, self-recorded data collected from 500 diverse and consenting participants using a custom-built data collection tool, ii) more than 50 hours of state-of-the-art audio and visual deepfakes generated using 14 video synthesis engines and three voice cloning engines, and iii) an embedding-based, identity-matching approach to ensure the creation of convincing, high-quality identity swaps that realistically simulate adversarial deepfake attacks. We also perform large-scale evaluations of state-of-the-art deepfake detectors and show that, without retraining, these detectors fail to generalize to the DeepSpeak dataset. These evaluations highlight the importance of a large and diverse dataset containing deepfakes from the latest generative-AI tools.
Related papers
- DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection [52.13851094326683]
misuse of advanced generative AI models has resulted in the proliferation of falsified data.<n>Mega-MMDF is a large-scale, diverse, and high-quality dataset for multimodal deepfake detection.<n>DeepfakeBench-MM is the first unified benchmark for multimodal deepfake detection.
arXiv Detail & Related papers (2025-10-26T10:40:52Z) - AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds [38.75029700407531]
AUDETER is a large-scale, highly diverse deepfake audio dataset.<n>It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns.<n>It is the largest deepfake audio dataset by scale.
arXiv Detail & Related papers (2025-09-04T16:03:44Z) - Evaluating Deepfake Detectors in the Wild [0.0]
We evaluate modern deepfake detectors, introducing a novel testing procedure designed to mimic real-world scenarios for deepfake detection.<n>Our analysis shows that detecting deepfakes still remains a challenging task.<n>Basic image manipulations, such as JPEG compression or image enhancement, can significantly reduce model performance.
arXiv Detail & Related papers (2025-07-29T15:17:00Z) - DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios [51.916287988122406]
We present a novel large-scale deepfake detection and localization (textbfDDL) dataset containing over $textbf1.4M+$ forged samples.<n>Our DDL not only provides a more challenging benchmark for complex real-world forgeries but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods.
arXiv Detail & Related papers (2025-06-29T15:29:03Z) - Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook [101.30779332427217]
We survey deepfake generation and detection techniques, including the most recent developments in the field.<n>We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content.<n>We develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content.
arXiv Detail & Related papers (2024-11-29T08:29:25Z) - Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization [3.9440964696313485]
In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity.
Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat.
We propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection.
arXiv Detail & Related papers (2024-08-02T18:45:01Z) - HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - DF40: Toward Next-Generation Deepfake Detection [62.073997142001424]
existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset and testing them on other prevalent deepfake datasets.
But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world?
We construct a highly diverse deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques.
arXiv Detail & Related papers (2024-06-19T12:35:02Z) - PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset [7.952304417617302]
multimodal deepfakes, which manipulate both audio and visual modalities, have drawn increasing public concern.
To address this gap, we propose a novel, multilingual, and multimodal deepfake dataset: PolyGlotFake.
It includes content in seven languages, created using a variety of cutting-edge and popular Text-to-Speech, voice cloning, and lip-sync technologies.
arXiv Detail & Related papers (2024-05-14T06:40:05Z) - Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes [49.81915942821647]
This paper aims to evaluate the human ability to discern deepfake videos through a subjective study.
We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models.
We found that all AI models performed better than humans when evaluated on the same 40 videos.
arXiv Detail & Related papers (2024-05-07T07:57:15Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Linguistic Profiling of Deepfakes: An Open Database for Next-Generation
Deepfake Detection [40.20982463380279]
This paper introduces a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection.
It encompasses about 300K diverse deepfake samples from approximately 3K generative models, which boasts the largest number of deepfake models in the literature.
The two distinguished features enable DFLIP-3K to develop a benchmark that promotes progress in linguistic profiling of deepfakes.
arXiv Detail & Related papers (2024-01-04T16:19:52Z) - Vulnerability of Automatic Identity Recognition to Audio-Visual
Deepfakes [13.042731289687918]
We present the first realistic audio-visual database of deepfakes SWAN-DF, where lips and speech are well synchronized.
We demonstrate the vulnerability of a state of the art speaker recognition system, such as ECAPA-TDNN-based model from SpeechBrain.
arXiv Detail & Related papers (2023-11-29T14:18:04Z) - SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection [54.74467470358476]
This paper proposes a dataset for scene fake audio detection named SceneFake.
A manipulated audio is generated by only tampering with the acoustic scene of an original audio.
Some scene fake audio detection benchmark results on the SceneFake dataset are reported in this paper.
arXiv Detail & Related papers (2022-11-11T09:05:50Z) - DeePhy: On Deepfake Phylogeny [58.01631614114075]
DeePhy is a novel Deepfake Phylogeny dataset which consists of 5040 deepfake videos generated using three different generation techniques.
We present the benchmark on DeePhy dataset using six deepfake detection algorithms.
arXiv Detail & Related papers (2022-09-19T15:30:33Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - MAD: A Scalable Dataset for Language Grounding in Videos from Movie
Audio Descriptions [109.84031235538002]
We present MAD (Movie Audio Descriptions), a novel benchmark that departs from the paradigm of augmenting existing video datasets with text annotations.
MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video and exhibits a significant reduction in the currently diagnosed biases for video-language grounding datasets.
arXiv Detail & Related papers (2021-12-01T11:47:09Z) - Challenges and Solutions in DeepFakes [8.401473551081747]
A deep learning-powered application recently emerged is Deep Fake.
It helps to create fake images and videos that human cannot distinguish them from the real ones and are recent off-shelf manipulation technique that allows swapping two identities in a single video.
We introduce a dataset of 140k real and fake faces which contain 70k real faces from the Flickr dataset collected by Nvidia, as well as 70k fake faces sampled from 1 million fake faces generated by style GAN.
We will train our model in the dataset so that our model can identify real or fake faces.
arXiv Detail & Related papers (2021-09-12T01:22:12Z) - FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset [21.199288324085444]
Recently, a new problem of generating cloned or synthesized human voice of a person is emerging.
With the emerging threat of impersonation attacks using deepfake videos and audios, new deepfake detectors are need that focuses on both, video and audio.
We propose a novel Audio-Video Deepfake dataset (FakeAVCeleb) that not only contains deepfake videos but respective synthesized cloned audios as well.
arXiv Detail & Related papers (2021-08-11T07:49:36Z) - Detecting Deepfake Videos Using Euler Video Magnification [1.8506048493564673]
Deepfake videos are manipulating videos using advanced machine learning techniques.
In this paper, we examine a technique for possible identification of deepfake videos.
Our approach uses features extracted from the Euler technique to train three models to classify counterfeit and unaltered videos.
arXiv Detail & Related papers (2021-01-27T17:37:23Z) - WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection [82.42495493102805]
We introduce a new dataset WildDeepfake which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet.
We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically.
arXiv Detail & Related papers (2021-01-05T11:10:32Z) - Deepfake Video Forensics based on Transfer Learning [0.0]
"Deepfake" can create fake images and videos that humans cannot differentiate from the genuine ones.
This paper details retraining the image classification models to apprehend the features from each deepfake video frames.
When checking Deepfake videos, this technique received more than 87 per cent accuracy.
arXiv Detail & Related papers (2020-04-29T13:21:28Z) - DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery
Detection [93.24684159708114]
DeeperForensics-1.0 is the largest face forgery detection dataset by far, with 60,000 videos constituted by a total of 17.6 million frames.
The quality of generated videos outperforms those in existing datasets, validated by user studies.
The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations.
arXiv Detail & Related papers (2020-01-09T14:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.