Spatio-temporal Features for Generalized Detection of Deepfake Videos
- URL: http://arxiv.org/abs/2010.11844v1
- Date: Thu, 22 Oct 2020 16:28:50 GMT
- Title: Spatio-temporal Features for Generalized Detection of Deepfake Videos
- Authors: Ipek Ganiyusufoglu, L. Minh Ng\^o, Nedko Savov, Sezer Karaoglu, Theo
Gevers
- Abstract summary: We propose-temporal features, modeled by 3D CNNs, to extend the capabilities to detect new sorts of deep videos.
We show that our approach outperforms existing methods in terms of generalization capabilities.
- Score: 12.453288832098314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For deepfake detection, video-level detectors have not been explored as
extensively as image-level detectors, which do not exploit temporal data. In
this paper, we empirically show that existing approaches on image and sequence
classifiers generalize poorly to new manipulation techniques. To this end, we
propose spatio-temporal features, modeled by 3D CNNs, to extend the
generalization capabilities to detect new sorts of deepfake videos. We show
that spatial features learn distinct deepfake-method-specific attributes, while
spatio-temporal features capture shared attributes between deepfake methods. We
provide an in-depth analysis of how the sequential and spatio-temporal video
encoders are utilizing temporal information using DFDC dataset
arXiv:2006.07397. Thus, we unravel that our approach captures local
spatio-temporal relations and inconsistencies in the deepfake videos while
existing sequence encoders are indifferent to it. Through large scale
experiments conducted on the FaceForensics++ arXiv:1901.08971 and Deeper
Forensics arXiv:2001.03024 datasets, we show that our approach outperforms
existing methods in terms of generalization capabilities.
Related papers
- DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection [18.116004258266535]
A transformer-based framework for Diffusion Inconsistency Learning (DIP) is proposed, which exploits directional inconsistencies for deepfake video detection.
Our method could effectively identify forgery clues and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-10-31T06:26:00Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Exploring Spatial-Temporal Features for Deepfake Detection and
Localization [0.0]
We propose a Deepfake network that simultaneously explores spatial and temporal features for detecting and localizing forged regions.
Specifically, we design a new Anchor-Mesh Motion (AMM) algorithm to extract temporal (motion) features by modeling the precise geometric movements of the facial micro-expression.
The superiority of our ST-DDL network is verified by experimental comparisons with several state-of-the-art competitors.
arXiv Detail & Related papers (2022-10-28T03:38:49Z) - Multimodal Graph Learning for Deepfake Detection [10.077496841634135]
Existing deepfake detectors face several challenges in achieving robustness and generalization.
We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities.
Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
arXiv Detail & Related papers (2022-09-12T17:17:49Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions.
Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods.
We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Deepfake Detection using Spatiotemporal Convolutional Networks [0.0]
Deepfake detection methods only use individual frames and therefore fail to learn from temporal information.
We created a benchmark of performance using Celeb-DF dataset.
Our methods outperformed state-of-theart frame-based detection methods.
arXiv Detail & Related papers (2020-06-26T01:32:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.