Related papers: Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection

URL: http://arxiv.org/abs/2411.19715v3
Date: Fri, 23 May 2025 16:14:40 GMT
Title: Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection
Authors: Xinjie Cui, Yuezun Li, Delong Zhu, Jiaran Zhou, Junyu Dong, Siwei Lyu,
Abstract summary: We describe an adapter network designed to transform CLIP into an effective and generalizable face forgery detector.<n>We introduce an adapter to learn face forgery traces -- the blending boundaries unique to forged faces, guided by task-specific objectives.<n>With only 5.7M trainable parameters, our method achieves a significant performance boost, improving by approximately 7% on average across five standard datasets.
Score: 55.142997327506706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We describe Forensics Adapter, an adapter network designed to transform CLIP into an effective and generalizable face forgery detector. Although CLIP is highly versatile, adapting it for face forgery detection is non-trivial as forgery-related knowledge is entangled with a wide range of unrelated knowledge. Existing methods treat CLIP merely as a feature extractor, lacking task-specific adaptation, which limits their effectiveness. To address this, we introduce an adapter to learn face forgery traces -- the blending boundaries unique to forged faces, guided by task-specific objectives. Then we enhance the CLIP visual tokens with a dedicated interaction strategy that communicates knowledge across CLIP and the adapter. Since the adapter is alongside CLIP, its versatility is highly retained, naturally ensuring strong generalizability in face forgery detection. With only 5.7M trainable parameters, our method achieves a significant performance boost, improving by approximately 7% on average across five standard datasets. Additionally, we describe Forensics Adapter++, an extended method that incorporates textual modality via a newly proposed forgery-aware prompt learning strategy. This extension leads to a further 1.3% performance boost over the original Forensics Adapter. We believe the proposed methods can serve as a baseline for future CLIP-based face forgery detection methods. The codes have been released at https://github.com/OUC-VAS/ForensicsAdapter.

Related papers

AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation [8.252046294696585]
We propose AF-CLIP (Anomaly-Focused CLIP) by dramatically enhancing its visual representations to focus on local defects.<n>Our approach introduces a lightweight adapter that emphasizes anomaly-relevant patterns in visual features.<n>Our method is also extended to few-shot scenarios by extra memory banks.
arXiv Detail & Related papers (2025-07-26T13:34:38Z)
MadCLIP: Few-shot Medical Anomaly Detection with CLIP [14.023527193608142]
An innovative few-shot anomaly detection approach is presented, leveraging the pre-trained CLIP model for medical data.<n>A dual-branch design is proposed to separately capture normal and abnormal features through learnable adapters.<n>To improve semantic alignment, learnable text prompts are employed to link visual features.
arXiv Detail & Related papers (2025-06-30T12:56:17Z)
AdaptCLIP: Adapting CLIP for Universal Visual Anomaly Detection [39.72202031440292]
Universal visual anomaly detection aims to identify anomalies from novel or unseen vision domains without additional fine-tuning.<n>Recent studies have demonstrated that pre-trained vision-language models like CLIP exhibit strong generalization with just zero or a few normal images.<n>We present a simple yet effective method called AdaptCLIP based on two key insights.
arXiv Detail & Related papers (2025-05-15T03:24:28Z)
Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection [23.48106270102081]
This paper tackles the challenge of detecting partially manipulated facial deepfakes. We leverage the Contrastive Language-Image Pre-training (CLIP) model, specifically its ViT-L/14 visual encoder. The proposed approach utilizes parameter-efficient fine-tuning (PEFT) techniques, such as LN-tuning, to adjust a small subset of the model's parameters.
arXiv Detail & Related papers (2025-03-25T14:10:54Z)
Adapter-Enhanced Semantic Prompting for Continual Learning [91.63494614012362]
Continual learning (CL) enables models to adapt to evolving data streams.<n>Traditional methods usually retain the past data for replay or add additional branches in the model to learn new knowledge.<n>We propose a novel lightweight CL framework, which integrates prompt tuning and adapter techniques.
arXiv Detail & Related papers (2024-12-15T06:14:55Z)
Generalizable Facial Expression Recognition [41.639746139849564]
SOTA facial expression recognition (FER) methods fail on test sets with domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model. This paper aims to improve the zero-shot generalization ability of FER methods on different unseen test sets using only one train set.
arXiv Detail & Related papers (2024-08-20T07:48:45Z)
C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection [98.34703790782254]
We introduce Category Common Prompt CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder. Our method achieves a 12.41% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing.
arXiv Detail & Related papers (2024-08-19T02:14:25Z)
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model [64.21017759533474]
Contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts. Few-shot learning methods based on CLIP typically require offline fine-tuning of the parameters on few-shot samples. We propose the Meta-Adapter, a lightweight residual-style adapter, to refine the CLIP features guided by the few-shot samples in an online manner.
arXiv Detail & Related papers (2023-11-07T07:27:16Z)
Side Adapter Network for Open-Vocabulary Semantic Segmentation [69.18441687386733]
This paper presents a new framework for open-vocabulary semantic segmentation with the pre-trained vision-language model, named Side Adapter Network (SAN) A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias. Our approach significantly outperforms other counterparts, with up to 18 times fewer trainable parameters and 19 times faster inference speed.
arXiv Detail & Related papers (2023-02-23T18:58:28Z)
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification [58.06983806317233]
Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations using large-scale image-text pairs. To enhance CLIP's adaption capability, existing methods proposed to fine-tune additional learnable modules. We propose a training-free adaption method for CLIP to conduct few-shot classification, termed as Tip-Adapter.
arXiv Detail & Related papers (2022-07-19T19:12:11Z)
Face Presentation Attack Detection using Taskonomy Feature [26.343512092423985]
Presentation Attack Detection (PAD) methods are critical to ensure the security of Face Recognition Systems (FRSs) Existing PAD methods are highly dependent on the limited training set and cannot generalize well to unknown PAs. We propose to apply taskonomy (task taxonomy) from other face-related tasks to solve face PAD.
arXiv Detail & Related papers (2021-11-22T08:35:26Z)
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling [78.62723847797382]
We propose textbfTraining-Free CLtextbfIP-textbfAdapter (textbfTip-Adapter), which not only inherits CLIP's training-free advantage but also performs comparably or even better than CLIP-Adapter. We conduct extensive experiments of few-shot classification on ImageNet and other 10 datasets to demonstrate the superiority of proposed Tip-Adapter.
arXiv Detail & Related papers (2021-11-06T18:09:22Z)
BioMetricNet: deep unconstrained face verification through learning of metrics regularized onto Gaussian distributions [25.00475462213752]
We present BioMetricNet, a novel framework for deep unconstrained face verification. The proposed approach does not impose any specific metric on facial features. It shapes the decision space by learning a latent representation in which matching and non-matching pairs are mapped onto clearly separated and well-behaved target distributions.
arXiv Detail & Related papers (2020-08-13T17:22:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.