Related papers: DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios

DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios

URL: http://arxiv.org/abs/2506.23292v2
Date: Thu, 30 Oct 2025 15:53:26 GMT
Title: DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios
Authors: Changtao Miao, Yi Zhang, Weize Gao, Zhiya Tan, Weiwei Feng, Man Luo, Jianshu Li, Ajian Liu, Yunfeng Diao, Qi Chu, Tao Gong, Zhe Li, Weibin Yao, Joey Tianyi Zhou,
Abstract summary: We present a novel large-scale deepfake detection and localization (textbfDDL) dataset containing over $textbf1.4M+$ forged samples.<n>Our DDL not only provides a more challenging benchmark for complex real-world forgeries but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods.
Score: 51.916287988122406
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. Recent studies have attempted to enhance the interpretability of classification results by providing spatial manipulation masks or temporal forgery segments. However, due to the limitations of forgery datasets, the practical effectiveness of these methods remains suboptimal. The primary reason lies in the fact that most existing deepfake datasets contain only binary labels, with limited variety in forgery scenarios, insufficient diversity in deepfake types, and relatively small data scales, making them inadequate for complex real-world scenarios.To address this predicament, we construct a novel large-scale deepfake detection and localization (\textbf{DDL}) dataset containing over $\textbf{1.4M+}$ forged samples and encompassing up to $\textbf{80}$ distinct deepfake methods. The DDL design incorporates four key innovations: (1) \textbf{Comprehensive Deepfake Methods} (covering 7 different generation architectures and a total of 80 methods), (2) \textbf{Varied Manipulation Modes} (incorporating 7 classic and 3 novel forgery modes), (3) \textbf{Diverse Forgery Scenarios and Modalities} (including 3 scenarios and 3 modalities), and (4) \textbf{Fine-grained Forgery Annotations} (providing 1.18M+ precise spatial masks and 0.23M+ precise temporal segments).Through these improvements, our DDL not only provides a more challenging benchmark for complex real-world forgeries but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods.

Related papers

MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios [56.87612820699948]
We propose the Multi-dimensional Face Forgery Image (textbfMFFI) dataset, tailored for real-world scenarios.<n>MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations.<n> Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients.
arXiv Detail & Related papers (2025-09-06T04:36:41Z)
Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning [45.99344620383706]
We introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing.<n>Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol.<n>We propose Veritas, a multi-modal large language model (MLLM) based deepfake detector.
arXiv Detail & Related papers (2025-08-28T17:53:05Z)
Practical Manipulation Model for Robust Deepfake Detection [55.2480439325792]
We develop a more real-world degradation model in the area of image super-resolution.<n>We extend the space of pseudo-fakes by using Poisson blending, more diverse masks, generator artifacts, and distractors.<n>We show clear increases of $3.51%$ and $6.21%$ AUC on the DFDC and DFDCP datasets, respectively.
arXiv Detail & Related papers (2025-06-05T15:06:16Z)
Cross-Branch Orthogonality for Improved Generalization in Face Deepfake Detection [43.2796409299818]
Deepfakes are becoming a nuisance to law enforcement authorities and the general public.<n>Existing deepfake detectors are struggling to keep up with the pace of improvements in deepfake generation.<n>This paper proposes a new strategy that leverages coarse-to-fine spatial information, semantic information, and their interactions.
arXiv Detail & Related papers (2025-05-08T01:49:53Z)
FakeFormer: Efficient Vulnerability-Driven Transformers for Generalisable Deepfake Detection [12.594436202557446]
This paper investigates why Vision Transformers (ViTs) exhibit a suboptimal performance when dealing with the detection of facial forgeries. We propose a deepfake detection framework called FakeFormer, which extends ViTs to enforce the extraction of subtle inconsistency-prone information. Experiments are conducted on diverse well-known datasets, including FF++, Celeb-DF, WildDeepfake, DFD, DFDCP, and DFDC.
arXiv Detail & Related papers (2024-10-29T11:36:49Z)
Can We Leave Deepfake Data Behind in Training Deepfake Detector? [14.167267434669501]
We rethink the role of blendfake in detecting deepfakes and formulate the process from "real to blendfake to deepfake" to be a progressive transition. Our design allows leveraging forgery information from both blendfake and deepfake effectively and comprehensively.
arXiv Detail & Related papers (2024-08-30T07:22:11Z)
DF40: Toward Next-Generation Deepfake Detection [62.073997142001424]
existing works identify top-notch detection algorithms and models by adhering to the common practice: training detectors on one specific dataset and testing them on other prevalent deepfake datasets. But can these stand-out "winners" be truly applied to tackle the myriad of realistic and diverse deepfakes lurking in the real world? We construct a highly diverse deepfake detection dataset called DF40, which comprises 40 distinct deepfake techniques.
arXiv Detail & Related papers (2024-06-19T12:35:02Z)
CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF) Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts. It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z)
Locate and Verify: A Two-Stream Network for Improved Deepfake Detection [33.50963446256726]
Current deepfake detection methods are typically inadequate in generalizability. We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts evidence. We also propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations.
arXiv Detail & Related papers (2023-09-20T08:25:19Z)
DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues [32.045504965382015]
Current deepfake detection models can generally recognize forgery images by training on a large dataset. The accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. We present a novel incremental learning framework that improves the generalization of deepfake detection models.
arXiv Detail & Related papers (2023-09-18T07:02:26Z)
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos. We propose to perform the deepfake detection from an unexplored voice-face matching view. Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z)
Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations. We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z)
The DeepFake Detection Challenge (DFDC) Dataset [8.451007921188019]
Deepfakes are a technique that allows anyone to swap two identities in a single video. To counter this emerging threat, we have constructed an extremely large face swap video dataset. All recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset.
arXiv Detail & Related papers (2020-06-12T18:15:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.