Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection
- URL: http://arxiv.org/abs/2603.01450v1
- Date: Mon, 02 Mar 2026 04:58:00 GMT
- Title: Deepfake Forensics Adapter: A Dual-Stream Network for Generalizable Deepfake Detection
- Authors: Jianfeng Liao, Yichen Wei, Raymond Chan Ching Bon, Shulan Wang, Kam-Pui Chow, Kwok-Yan Lam,
- Abstract summary: Deepfake Forensics Adapter (DFA) is a novel dual-stream framework that synergizes vision-language foundation models with targeted forensics analysis.<n>Our approach integrates a pre-trained CLIP model with three core components to achieve specialized deepfake detection.<n>Our framework not only demonstrates state-of-the-art performance, but also points out a feasible and effective direction for developing a robust deepfake detection system.
- Score: 22.889849855283355
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of deepfake generation techniques poses significant threats to public safety and causes societal harm through the creation of highly realistic synthetic facial media. While existing detection methods demonstrate limitations in generalizing to emerging forgery patterns, this paper presents Deepfake Forensics Adapter (DFA), a novel dual-stream framework that synergizes vision-language foundation models with targeted forensics analysis. Our approach integrates a pre-trained CLIP model with three core components to achieve specialized deepfake detection by leveraging the powerful general capabilities of CLIP without changing CLIP parameters: 1) A Global Feature Adapter is used to identify global inconsistencies in image content that may indicate forgery, 2) A Local Anomaly Stream enhances the model's ability to perceive local facial forgery cues by explicitly leveraging facial structure priors, and 3) An Interactive Fusion Classifier promotes deep interaction and fusion between global and local features using a transformer encoder. Extensive evaluations of frame-level and video-level benchmarks demonstrate the superior generalization capabilities of DFA, particularly achieving state-of-the-art performance in the challenging DFDC dataset with frame-level AUC/EER of 0.816/0.256 and video-level AUC/EER of 0.836/0.251, representing a 4.8% video AUC improvement over previous methods. Our framework not only demonstrates state-of-the-art performance, but also points out a feasible and effective direction for developing a robust deepfake detection system with enhanced generalization capabilities against the evolving deepfake threats. Our code is available at https://github.com/Liao330/DFA.git
Related papers
- Patch-Discontinuity Mining for Generalized Deepfake Detection [18.30761992906741]
Deepfake detection methods often rely on handcrafted forensic cues and complex architectures.<n>We propose GenDF, a framework that transfers a powerful vision model to the deepfake detection task with a compact and neat network design.<n>Experiments demonstrate that GenDF achieves state-of-the-art generalization performance in cross-domain and cross-manipulation settings.
arXiv Detail & Related papers (2025-12-26T13:18:14Z) - AdaptPrompt: Parameter-Efficient Adaptation of VLMs for Generalizable Deepfake Detection [7.76090543025328]
Recent advances in image generation have led to the widespread availability of highly realistic synthetic media, increasing the difficulty of reliable deepfake detection.<n>A key challenge is generalization, as detectors trained on a narrow class of generators often fail when confronted with unseen models.<n>We address the pressing need for generalizable detection by leveraging large vision-language models, specifically CLIP, to identify synthetic content across diverse generative techniques.
arXiv Detail & Related papers (2025-12-19T16:06:03Z) - DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis [59.8324489002129]
We introduce DeepShield, a deepfake detection framework that balances local sensitivity and global generalization to improve robustness across unseen forgeries.<n>DeepShield appliestemporal artifact modeling and patch-wise supervision to capture fine-grained inconsistencies often overlooked by global models.
arXiv Detail & Related papers (2025-10-29T07:35:29Z) - Deepfake Detection that Generalizes Across Benchmarks [48.85953407706351]
The generalization of deepfake detectors to unseen manipulation techniques remains a challenge for practical deployment.<n>This work demonstrates that robust generalization is achievable through a parameter-efficient adaptation of one of the foundational pre-trained vision encoders.<n>The proposed method achieves state-of-the-art performance, outperforming more complex, recent approaches in average cross-dataset AUROC.
arXiv Detail & Related papers (2025-08-08T12:03:56Z) - DDL: A Large-Scale Datasets for Deepfake Detection and Localization in Diversified Real-World Scenarios [51.916287988122406]
We present a novel large-scale deepfake detection and localization (textbfDDL) dataset containing over $textbf1.4M+$ forged samples.<n>Our DDL not only provides a more challenging benchmark for complex real-world forgeries but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods.
arXiv Detail & Related papers (2025-06-29T15:29:03Z) - HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection [4.908389661988192]
HFMF is a comprehensive two-stage deepfake detection framework.<n>It integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.<n>We demonstrate that our architecture achieves superior performance across diverse dataset benchmarks.
arXiv Detail & Related papers (2025-01-10T00:20:29Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Locate and Verify: A Two-Stream Network for Improved Deepfake Detection [33.50963446256726]
Current deepfake detection methods are typically inadequate in generalizability.
We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts evidence.
We also propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations.
arXiv Detail & Related papers (2023-09-20T08:25:19Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake
Detection [50.33525966541906]
Existing multimodal detection methods capture audio-visual inconsistencies to expose Deepfake videos.
We propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics.
Our model can be easily adapted to the downstream Deepfake datasets with fine-tuning.
arXiv Detail & Related papers (2023-06-12T06:06:05Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.