Self-Supervised Transformer with Domain Adaptive Reconstruction for
General Face Forgery Video Detection
- URL: http://arxiv.org/abs/2309.04795v1
- Date: Sat, 9 Sep 2023 13:40:44 GMT
- Title: Self-Supervised Transformer with Domain Adaptive Reconstruction for
General Face Forgery Video Detection
- Authors: Daichi Zhang, Zihao Xiao, Jianmin Li, Shiming Ge
- Abstract summary: A Self-supervised Transformer cooperating with Contrastive and Reconstruction learning (CoReST) is proposed.
Two specific auxiliary tasks incorporated contrastive and reconstruction learning are designed to enhance the representation learning.
Our proposed method performs even better than the state-of-the-art supervised competitors with impressive generalization.
- Score: 24.619102747582456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Face forgery videos have caused severe social public concern, and various
detectors have been proposed recently. However, most of them are trained in a
supervised manner with limited generalization when detecting videos from
different forgery methods or real source videos. To tackle this issue, we
explore to take full advantage of the difference between real and forgery
videos by only exploring the common representation of real face videos. In this
paper, a Self-supervised Transformer cooperating with Contrastive and
Reconstruction learning (CoReST) is proposed, which is first pre-trained only
on real face videos in a self-supervised manner, and then fine-tuned a linear
head on specific face forgery video datasets. Two specific auxiliary tasks
incorporated contrastive and reconstruction learning are designed to enhance
the representation learning. Furthermore, a Domain Adaptive Reconstruction
(DAR) module is introduced to bridge the gap between different forgery domains
by reconstructing on unlabeled target videos when fine-tuning. Extensive
experiments on public datasets demonstrate that our proposed method performs
even better than the state-of-the-art supervised competitors with impressive
generalization.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Learning Natural Consistency Representation for Face Forgery Video Detection [23.53549629885891]
We propose to learn the Natural representation (NACO) real face videos in a self-supervised manner.
Our method outperforms other state-of-the-art methods with impressive robustness.
arXiv Detail & Related papers (2024-07-15T09:00:02Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - AltFreezing for More General Video Face Forgery Detection [138.5732617371004]
We propose to capture both spatial and unseen temporal artifacts in one model for face forgery detection.
We present a novel training strategy called AltFreezing for more general face forgery detection.
arXiv Detail & Related papers (2023-07-17T08:24:58Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Convolutional Transformer based Dual Discriminator Generative
Adversarial Networks for Video Anomaly Detection [27.433162897608543]
We propose Conversaal Transformer based Dual Discriminator Generative Adrial Networks (CT-D2GAN) to perform unsupervised video anomaly detection.
It contains three key components, i., a convolutional encoder to capture the spatial information of input clips, a temporal self-attention module to encode the temporal dynamics and predict the future frame.
arXiv Detail & Related papers (2021-07-29T03:07:25Z) - Over-the-Air Adversarial Flickering Attacks against Video Recognition
Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation.
We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation.
The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.