Deepfake Video Detection with Spatiotemporal Dropout Transformer
- URL: http://arxiv.org/abs/2207.06612v1
- Date: Thu, 14 Jul 2022 02:04:42 GMT
- Title: Deepfake Video Detection with Spatiotemporal Dropout Transformer
- Authors: Daichi Zhang, Fanzhao Lin, Yingying Hua, Pengju Wang, Dan Zeng,
Shiming Ge
- Abstract summary: This paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via a dropout transformer.
The approach reorganizes each input video into bag of patches that is then fed into a vision transformer to achieve robust representation.
- Score: 32.577096083927884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the abuse of deepfake technology has caused serious concerns recently,
how to detect deepfake videos is still a challenge due to the high
photo-realistic synthesis of each frame. Existing image-level approaches often
focus on single frame and ignore the spatiotemporal cues hidden in deepfake
videos, resulting in poor generalization and robustness. The key of a
video-level detector is to fully exploit the spatiotemporal inconsistency
distributed in local facial regions across different frames in deepfake videos.
Inspired by that, this paper proposes a simple yet effective patch-level
approach to facilitate deepfake video detection via spatiotemporal dropout
transformer. The approach reorganizes each input video into bag of patches that
is then fed into a vision transformer to achieve robust representation.
Specifically, a spatiotemporal dropout operation is proposed to fully explore
patch-level spatiotemporal cues and serve as effective data augmentation to
further enhance model's robustness and generalization ability. The operation is
flexible and can be easily plugged into existing vision transformers. Extensive
experiments demonstrate the effectiveness of our approach against 25
state-of-the-arts with impressive robustness, generalizability, and
representation ability.
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - The Tug-of-War Between Deepfake Generation and Detection [4.62070292702111]
Multimodal generative models are rapidly evolving, leading to a surge in the generation of realistic video and audio.
Deepfake videos, which can convincingly impersonate individuals, have particularly garnered attention due to their potential misuse.
This survey paper examines the dual landscape of deepfake video generation and detection, emphasizing the need for effective countermeasures.
arXiv Detail & Related papers (2024-07-08T17:49:41Z) - GRACE: Graph-Regularized Attentive Convolutional Entanglement with Laplacian Smoothing for Robust DeepFake Video Detection [7.591187423217017]
This paper introduces a novel method for robust DeepFake video detection based on graph convolutional network with graph Laplacian.
The proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences.
arXiv Detail & Related papers (2024-06-28T14:17:16Z) - Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories [10.913345858983275]
Deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals.
In this paper, we propose a deepfake video detection method based on 3Dtemporal motion features.
Our method yields satisfactory results and showcases its potential for practical applications.
arXiv Detail & Related papers (2024-04-28T11:48:13Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Undercover Deepfakes: Detecting Fake Segments in Videos [1.2609216345578933]
deepfake generation is a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth.
In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels.
In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes.
arXiv Detail & Related papers (2023-05-11T04:43:10Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes.
We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection.
We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z) - Sharp Multiple Instance Learning for DeepFake Video Detection [54.12548421282696]
We introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated.
A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction.
Experiments on FFPMS and widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection.
arXiv Detail & Related papers (2020-08-11T08:52:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.