MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios
- URL: http://arxiv.org/abs/2509.05592v1
- Date: Sat, 06 Sep 2025 04:36:41 GMT
- Title: MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios
- Authors: Changtao Miao, Yi Zhang, Man Luo, Weiwei Feng, Kaiyuan Zheng, Qi Chu, Tao Gong, Jianshu Li, Yunfeng Diao, Wei Zhou, Joey Tianyi Zhou, Xiaoshuai Hao,
- Abstract summary: We propose the Multi-dimensional Face Forgery Image (textbfMFFI) dataset, tailored for real-world scenarios.<n>MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations.<n> Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients.
- Score: 56.87612820699948
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Rapid advances in Artificial Intelligence Generated Content (AIGC) have enabled increasingly sophisticated face forgeries, posing a significant threat to social security. However, current Deepfake detection methods are limited by constraints in existing datasets, which lack the diversity necessary in real-world scenarios. Specifically, these data sets fall short in four key areas: unknown of advanced forgery techniques, variability of facial scenes, richness of real data, and degradation of real-world propagation. To address these challenges, we propose the Multi-dimensional Face Forgery Image (\textbf{MFFI}) dataset, tailored for real-world scenarios. MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations. MFFI integrates $50$ different forgery methods and contains $1024K$ image samples. Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients. These results validate the technical advance and practical utility of MFFI in simulating real-world conditions. The dataset and additional details are publicly available at {https://github.com/inclusionConf/MFFI}.
Related papers
- Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection [23.328598687742712]
We make the first attempt and propose FS-VFM to learn fundamental representations of real face images.<n>We introduce three learning objectives, namely 3C, that synergize masked image modeling (MIM) and instance discrimination (ID)<n>We present a reliable self-distillation mechanism that seamlessly couples MIM with ID to establish underlying local-to-global correspondence.<n>Experiments on 11 public benchmarks demonstrate that our FS-VFM consistently generalizes better than diverse VFMs.
arXiv Detail & Related papers (2025-10-12T15:38:03Z) - Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning [45.99344620383706]
We introduce HydraFake, a dataset that simulates real-world challenges with hierarchical generalization testing.<n>Specifically, HydraFake involves diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol.<n>We propose Veritas, a multi-modal large language model (MLLM) based deepfake detector.
arXiv Detail & Related papers (2025-08-28T17:53:05Z) - DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios [51.916287988122406]
We present a novel large-scale deepfake detection and localization (textbfDDL) dataset containing over $textbf1.4M+$ forged samples.<n>Our DDL not only provides a more challenging benchmark for complex real-world forgeries but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods.
arXiv Detail & Related papers (2025-06-29T15:29:03Z) - Real-World Remote Sensing Image Dehazing: Benchmark and Baseline [19.747354924759104]
The scarcity of real-world remote sensing hazy image pairs has compelled existing methods to rely primarily on synthetic datasets.<n>We introduce Real-World Remote Sensing Hazy Image dataset (RRSHID), the first large-scale dataset featuring real-world hazy and dehazed image pairs.<n>Based on this, we propose MCAF-Net, a novel framework tailored for real-world RSID.
arXiv Detail & Related papers (2025-03-23T07:15:46Z) - FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant [59.2438504610849]
We introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS)
Our method not only provides user-friendly and explainable results but also significantly boosts accuracy and robustness compared to previous methods.
arXiv Detail & Related papers (2024-08-19T15:15:20Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results [73.98594459933008]
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems.
This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets.
We introduce the Wild Face Anti-Spoofing dataset, a large-scale, diverse FAS dataset collected in unconstrained settings.
arXiv Detail & Related papers (2023-04-12T10:29:42Z) - Real Face Foundation Representation Learning for Generalized Deepfake
Detection [74.4691295738097]
The emergence of deepfake technologies has become a matter of social concern as they pose threats to individual privacy and public security.
It is almost impossible to collect sufficient representative fake faces, and it is hard for existing detectors to generalize to all types of manipulation.
We propose Real Face Foundation Representation Learning (RFFR), which aims to learn a general representation from large-scale real face datasets.
arXiv Detail & Related papers (2023-03-15T08:27:56Z) - GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection [29.118321046339656]
We propose a framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection.
GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction.
arXiv Detail & Related papers (2022-11-16T02:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.