C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
- URL: http://arxiv.org/abs/2408.09647v1
- Date: Mon, 19 Aug 2024 02:14:25 GMT
- Title: C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection
- Authors: Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, Yunchao Wei,
- Abstract summary: We introduce Category Common Prompt CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder.
Our method achieves a 12.41% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing.
- Score: 98.34703790782254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a linear classifier; and 2) exploring the detection potential of CLIP. In this study, we delve into the underlying mechanisms of CLIP's detection capabilities by decoding its detection features into text and performing word frequency analysis. Our finding indicates that CLIP detects deepfakes by recognizing similar concepts (Fig. \ref{fig:fig1} a). Building on this insight, we introduce Category Common Prompt CLIP, called C2P-CLIP, which integrates the category common prompt into the text encoder to inject category-related concepts into the image encoder, thereby enhancing detection performance (Fig. \ref{fig:fig1} b). Our method achieves a 12.41\% improvement in detection accuracy compared to the original CLIP, without introducing additional parameters during testing. Comprehensive experiments conducted on two widely-used datasets, encompassing 20 generation models, validate the efficacy of the proposed method, demonstrating state-of-the-art performance. The code is available at \url{https://github.com/chuangchuangtan/C2P-CLIP-DeepfakeDetection}
Related papers
- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection [6.4348035950413]
We present the first generation-based model using CLIP for zero-shot HOI detection, coined HOIGen.
We develop a CLIP-injected feature generator in accordance with the generation of human, object and union features.
To enrich the HOI scores, we construct a generative prototype bank in a pairwise HOI recognition branch, and a multi-knowledge prototype bank in an image-wise HOI recognition branch.
arXiv Detail & Related papers (2024-08-12T08:02:37Z) - Spectral Prompt Tuning:Unveiling Unseen Classes for Zero-Shot Semantic Segmentation [20.880942041889444]
We propose SPT-SEG, a one-stage approach that improves CLIP's adaptability from image to pixel.
Specifically, we introduce Spectral Prompt Tuning (SPT), incorporating spectral prompts into the CLIP visual encoder's shallow layers.
We demonstrate the superiority of our method over state-of-the-art approaches, performing well across all classes and particularly excelling in handling unseen classes.
arXiv Detail & Related papers (2023-12-20T04:27:13Z) - HOICLIP: Efficient Knowledge Transfer for HOI Detection with
Vision-Language Models [30.279621764192843]
Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions.
Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors.
We propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization.
arXiv Detail & Related papers (2023-03-28T07:54:54Z) - GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation
Learning [55.77244064907146]
One-stage detector GridCLIP learns grid-level representations to adapt to the intrinsic principle of one-stage detection learning.
Experiments show that the learned CLIP-based grid-level representations boost the performance of undersampled (infrequent and novel) categories.
arXiv Detail & Related papers (2023-03-16T12:06:02Z) - CLIP-guided Prototype Modulating for Few-shot Action Recognition [49.11385095278407]
This work aims to transfer the powerful multimodal knowledge of CLIP to alleviate the inaccurate prototype estimation issue.
We present a CLIP-guided prototype modulating framework called CLIP-FSAR, which consists of a video-text contrastive objective and a prototype modulation.
arXiv Detail & Related papers (2023-03-06T09:17:47Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [69.75559390700887]
This paper explores a relatively less-studied methodology based on classification.
We propose new techniques to push its frontier in two aspects.
Experiments and visual analysis on large-scale public datasets for aerial images show the effectiveness of our approach.
arXiv Detail & Related papers (2020-11-19T05:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.