Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection
- URL: http://arxiv.org/abs/2309.04795v2
- Date: Thu, 24 Oct 2024 02:17:11 GMT
- Title: Latent Spatiotemporal Adaptation for Generalized Face Forgery Video Detection
- Authors: Daichi Zhang, Zihao Xiao, Jianmin Li, Shiming Ge,
- Abstract summary: We propose a Latemporal Spatio(LAST) approach to facilitate generalized face video detection.
We first model thetemporal patterns face videos by incorporating a lightweight CNN to extract local spatial features of each frame.
Then we learn the long-termtemporal representations in latent space videos, which should contain more clues than in pixel space.
- Score: 22.536129731902783
- License:
- Abstract: Face forgery videos have caused severe public concerns, and many detectors have been proposed. However, most of these detectors suffer from limited generalization when detecting videos from unknown distributions, such as from unseen forgery methods. In this paper, we find that different forgery videos have distinct spatiotemporal patterns, which may be the key to generalization. To leverage this finding, we propose a Latent Spatiotemporal Adaptation~(LAST) approach to facilitate generalized face forgery video detection. The key idea is to optimize the detector adaptive to the spatiotemporal patterns of unknown videos in latent space to improve the generalization. Specifically, we first model the spatiotemporal patterns of face videos by incorporating a lightweight CNN to extract local spatial features of each frame and then cascading a vision transformer to learn the long-term spatiotemporal representations in latent space, which should contain more clues than in pixel space. Then by optimizing a transferable linear head to perform the usual forgery detection task on known videos and recover the spatiotemporal clues of unknown target videos in a semi-supervised manner, our detector could flexibly adapt to unknown videos' spatiotemporal patterns, leading to improved generalization. Additionally, to eliminate the influence of specific forgery videos, we pre-train our CNN and transformer only on real videos with two simple yet effective self-supervised tasks: reconstruction and contrastive learning in latent space and keep them frozen during fine-tuning. Extensive experiments on public datasets demonstrate that our approach achieves state-of-the-art performance against other competitors with impressive generalization and robustness.
Related papers
- Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Learning Natural Consistency Representation for Face Forgery Video Detection [23.53549629885891]
We propose to learn the Natural representation (NACO) real face videos in a self-supervised manner.
Our method outperforms other state-of-the-art methods with impressive robustness.
arXiv Detail & Related papers (2024-07-15T09:00:02Z) - Dynamic Erasing Network Based on Multi-Scale Temporal Features for
Weakly Supervised Video Anomaly Detection [103.92970668001277]
We propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection.
We first propose a multi-scale temporal modeling module, capable of extracting features from segments of varying lengths.
Then, we design a dynamic erasing strategy, which dynamically assesses the completeness of the detected anomalies.
arXiv Detail & Related papers (2023-12-04T09:40:11Z) - AltFreezing for More General Video Face Forgery Detection [138.5732617371004]
We propose to capture both spatial and unseen temporal artifacts in one model for face forgery detection.
We present a novel training strategy called AltFreezing for more general face forgery detection.
arXiv Detail & Related papers (2023-07-17T08:24:58Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Convolutional Transformer based Dual Discriminator Generative
Adversarial Networks for Video Anomaly Detection [27.433162897608543]
We propose Conversaal Transformer based Dual Discriminator Generative Adrial Networks (CT-D2GAN) to perform unsupervised video anomaly detection.
It contains three key components, i., a convolutional encoder to capture the spatial information of input clips, a temporal self-attention module to encode the temporal dynamics and predict the future frame.
arXiv Detail & Related papers (2021-07-29T03:07:25Z) - Over-the-Air Adversarial Flickering Attacks against Video Recognition
Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation.
We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation.
The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.