Towards General Visual-Linguistic Face Forgery Detection
- URL: http://arxiv.org/abs/2307.16545v2
- Date: Wed, 7 Feb 2024 07:52:10 GMT
- Title: Towards General Visual-Linguistic Face Forgery Detection
- Authors: Ke Sun, Shen Chen, Taiping Yao, Haozhe Yang, Xiaoshuai Sun, Shouhong
Ding and Rongrong Ji
- Abstract summary: Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
- Score: 95.73987327101143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deepfakes are realistic face manipulations that can pose serious threats to
security, privacy, and trust. Existing methods mostly treat this task as binary
classification, which uses digital labels or mask signals to train the
detection model. We argue that such supervisions lack semantic information and
interpretability. To address this issues, in this paper, we propose a novel
paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses
fine-grained sentence-level prompts as the annotation. Since text annotations
are not available in current deepfakes datasets, VLFFD first generates the
mixed forgery image with corresponding fine-grained prompts via Prompt Forgery
Image Generator (PFIG). Then, the fine-grained mixed data and coarse-grained
original data and is jointly trained with the Coarse-and-Fine Co-training
framework (C2F), enabling the model to gain more generalization and
interpretability. The experiments show the proposed method improves the
existing detection models on several challenging benchmarks. Furthermore, we
have integrated our method with multimodal large models, achieving noteworthy
results that demonstrate the potential of our approach. This integration not
only enhances the performance of our VLFFD paradigm but also underscores the
versatility and adaptability of our method when combined with advanced
multimodal technologies, highlighting its potential in tackling the evolving
challenges of deepfake detection.
Related papers
- MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection [16.21235742118949]
We propose a novel approach that repurposes a well-trained Vision-Language Models (VLMs) for general deepfake detection.
Motivated by the model reprogramming paradigm that manipulates the model prediction via data perturbations, our method can reprogram a pretrained VLM model.
Our superior performances are at less cost of trainable parameters, making it a promising approach for real-world applications.
arXiv Detail & Related papers (2024-09-04T12:46:30Z) - Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning [50.7702397913573]
The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable.
Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology.
We propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection.
arXiv Detail & Related papers (2024-02-03T03:13:50Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.