Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning
- URL: http://arxiv.org/abs/2508.19730v2
- Date: Mon, 10 Nov 2025 12:17:56 GMT
- Title: Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning
- Authors: Stelios Mylonas, Symeon Papadopoulos,
- Abstract summary: We present a robust video deepfake detection framework with strong generalization.<n>Our method is built on top of FSFM, a self-supervised model trained on real face data.<n>We incorporate triplet loss variants during training, guiding the model to produce more separable embeddings between real and fake samples.
- Score: 11.097006771680896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing realism and accessibility of deepfakes have raised critical concerns about media authenticity and information integrity. Despite recent advances, deepfake detection models often struggle to generalize beyond their training distributions, particularly when applied to media content found in the wild. In this work, we present a robust video deepfake detection framework with strong generalization that takes advantage of the rich facial representations learned by face foundation models. Our method is built on top of FSFM, a self-supervised model trained on real face data, and is further fine-tuned using an ensemble of deepfake datasets spanning both face-swapping and face-reenactment manipulations. To enhance discriminative power, we incorporate triplet loss variants during training, guiding the model to produce more separable embeddings between real and fake samples. Additionally, we explore attribution-based supervision schemes, where deepfakes are categorized by manipulation type or source dataset, to assess their impact on generalization. Extensive experiments across diverse evaluation benchmarks demonstrate the effectiveness of our approach, especially in challenging real-world scenarios.
Related papers
- FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap Deepfakes [9.462613446025001]
Face-fake Deepfake videos pose growing risks to digital security, privacy, and media integrity.<n>FAME is a framework designed to capture subtle artifacts specific to different face-generative models.<n>Results show that FAME consistently outperforms existing methods in both accuracy and runtime.
arXiv Detail & Related papers (2025-06-13T05:47:09Z) - Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations [4.449835214520726]
Deepfake techniques are increasingly narrowing the gap between real and synthetic videos, raising serious privacy and security concerns.<n>This work presents the first detection approach explicitly designed to generalize to localized edits in deepfake videos.<n>Our method achieves a $20%$ improvement in accuracy over current state-of-the-art detection methods.
arXiv Detail & Related papers (2025-03-28T03:49:00Z) - Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection [16.21235742118949]
We propose a novel approach that repurposes a well-trained Vision-Language Models (VLMs) for general deepfake detection.<n>Motivated by the model reprogramming paradigm that manipulates the model prediction via input perturbations, our method can reprogram a pre-trained VLM model.<n>Experiments on several popular benchmark datasets demonstrate that the cross-dataset and cross-manipulation performances of deepfake detection can be significantly and consistently improved.
arXiv Detail & Related papers (2024-09-04T12:46:30Z) - Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach [77.65459419417533]
We propose an automated dataset expansion technique to support semantics-oriented DeepFake detection tasks.<n>We also resort to the joint embedding of face images and labels (depicted by text descriptions) for prediction.<n>Our method improves the generalizability of DeepFake detection and renders some degree of model interpretation by providing human-understandable explanations.
arXiv Detail & Related papers (2024-08-29T07:11:50Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model [16.69101880602321]
We propose a novel side-network-based decoder for generalized video-based Deepfake detection.<n>We also introduce Facial Component Guidance (FCG) to enhance spatial learning generalizability.<n>Our approach demonstrates promising generalizability on challenging Deepfake datasets.
arXiv Detail & Related papers (2024-04-08T14:58:52Z) - DeepFidelity: Perceptual Forgery Fidelity Assessment for Deepfake
Detection [67.3143177137102]
Deepfake detection refers to detecting artificially generated or edited faces in images or videos.
We propose a novel Deepfake detection framework named DeepFidelity to adaptively distinguish real and fake faces.
arXiv Detail & Related papers (2023-12-07T07:19:45Z) - CrossDF: Improving Cross-Domain Deepfake Detection with Deep Information Decomposition [53.860796916196634]
We propose a Deep Information Decomposition (DID) framework to enhance the performance of Cross-dataset Deepfake Detection (CrossDF)
Unlike most existing deepfake detection methods, our framework prioritizes high-level semantic features over specific visual artifacts.
It adaptively decomposes facial features into deepfake-related and irrelevant information, only using the intrinsic deepfake-related information for real/fake discrimination.
arXiv Detail & Related papers (2023-09-30T12:30:25Z) - Real Face Foundation Representation Learning for Generalized Deepfake
Detection [74.4691295738097]
The emergence of deepfake technologies has become a matter of social concern as they pose threats to individual privacy and public security.
It is almost impossible to collect sufficient representative fake faces, and it is hard for existing detectors to generalize to all types of manipulation.
We propose Real Face Foundation Representation Learning (RFFR), which aims to learn a general representation from large-scale real face datasets.
arXiv Detail & Related papers (2023-03-15T08:27:56Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.