Enhancing General Face Forgery Detection via Vision Transformer with
Low-Rank Adaptation
- URL: http://arxiv.org/abs/2303.00917v2
- Date: Mon, 27 Mar 2023 07:42:24 GMT
- Title: Enhancing General Face Forgery Detection via Vision Transformer with
Low-Rank Adaptation
- Authors: Chenqi Kong, Haoliang Li, Shiqi Wang
- Abstract summary: forgery faces pose pressing security concerns over fake news, fraud, impersonation, etc.
This paper designs a more general fake face detection model based on the vision transformer(ViT) architecture.
The proposed method achieves state-of-the-arts detection performances in both cross-manipulation and cross-dataset evaluations.
- Score: 31.780516471483985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, forgery faces pose pressing security concerns over fake news,
fraud, impersonation, etc. Despite the demonstrated success in intra-domain
face forgery detection, existing detection methods lack generalization
capability and tend to suffer from dramatic performance drops when deployed to
unforeseen domains. To mitigate this issue, this paper designs a more general
fake face detection model based on the vision transformer(ViT) architecture. In
the training phase, the pretrained ViT weights are freezed, and only the
Low-Rank Adaptation(LoRA) modules are updated. Additionally, the Single Center
Loss(SCL) is applied to supervise the training process, further improving the
generalization capability of the model. The proposed method achieves
state-of-the-arts detection performances in both cross-manipulation and
cross-dataset evaluations.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - A visualization method for data domain changes in CNN networks and the optimization method for selecting thresholds in classification tasks [1.1118946307353794]
Face Anti-Spoofing (FAS) has played a crucial role in preserving the security of face recognition technology.
With the rise of counterfeit face generation techniques, the challenge posed by digitally edited faces to face anti-spoofing is escalating.
We propose a visualization method that intuitively reflects the training outcomes of models by visualizing the prediction results on datasets.
arXiv Detail & Related papers (2024-04-19T03:12:17Z) - MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection [54.545054873239295]
Deepfakes have recently raised significant trust issues and security concerns among the public.
ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance.
This work introduces Mixture-of-Experts modules for Face Forgery Detection (MoE-FFD), a generalized yet parameter-efficient ViT-based approach.
arXiv Detail & Related papers (2024-04-12T13:02:08Z) - S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens [45.06704981913823]
Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces.
We propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms.
To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR)
Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests.
arXiv Detail & Related papers (2023-09-07T22:36:22Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Self-Supervised Graph Transformer for Deepfake Detection [1.8133635752982105]
Deepfake detection methods have shown promising results in recognizing forgeries within a given dataset.
Deepfake detection system must remain impartial to forgery types, appearance, and quality for guaranteed generalizable detection performance.
This study introduces a deepfake detection framework, leveraging a self-supervised pre-training model that delivers exceptional generalization ability.
arXiv Detail & Related papers (2023-07-27T17:22:41Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z) - Benchmarking Detection Transfer Learning with Vision Transformers [60.97703494764904]
complexity of object detection methods can make benchmarking non-trivial when new architectures, such as Vision Transformer (ViT) models, arrive.
We present training techniques that overcome these challenges, enabling the use of standard ViT models as the backbone of Mask R-CNN.
Our results show that recent masking-based unsupervised learning methods may, for the first time, provide convincing transfer learning improvements on COCO.
arXiv Detail & Related papers (2021-11-22T18:59:15Z) - On the Effectiveness of Vision Transformers for Zero-shot Face
Anti-Spoofing [7.665392786787577]
In this work, we use transfer learning from the vision transformer model for the zero-shot anti-spoofing task.
The proposed approach outperforms the state-of-the-art methods in the zero-shot protocols in the HQ-WMCA and SiW-M datasets by a large margin.
arXiv Detail & Related papers (2020-11-16T15:14:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.