Spatio-temporal Features for Generalized Detection of Deepfake Videos
- URL: http://arxiv.org/abs/2010.11844v1
- Date: Thu, 22 Oct 2020 16:28:50 GMT
- Title: Spatio-temporal Features for Generalized Detection of Deepfake Videos
- Authors: Ipek Ganiyusufoglu, L. Minh Ng\^o, Nedko Savov, Sezer Karaoglu, Theo
Gevers
- Abstract summary: We propose-temporal features, modeled by 3D CNNs, to extend the capabilities to detect new sorts of deep videos.
We show that our approach outperforms existing methods in terms of generalization capabilities.
- Score: 12.453288832098314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For deepfake detection, video-level detectors have not been explored as
extensively as image-level detectors, which do not exploit temporal data. In
this paper, we empirically show that existing approaches on image and sequence
classifiers generalize poorly to new manipulation techniques. To this end, we
propose spatio-temporal features, modeled by 3D CNNs, to extend the
generalization capabilities to detect new sorts of deepfake videos. We show
that spatial features learn distinct deepfake-method-specific attributes, while
spatio-temporal features capture shared attributes between deepfake methods. We
provide an in-depth analysis of how the sequential and spatio-temporal video
encoders are utilizing temporal information using DFDC dataset
arXiv:2006.07397. Thus, we unravel that our approach captures local
spatio-temporal relations and inconsistencies in the deepfake videos while
existing sequence encoders are indifferent to it. Through large scale
experiments conducted on the FaceForensics++ arXiv:1901.08971 and Deeper
Forensics arXiv:2001.03024 datasets, we show that our approach outperforms
existing methods in terms of generalization capabilities.
Related papers
- Deepfake Detection with Spatio-Temporal Consistency and Attention [46.1135899490656]
Deepfake videos are causing growing concerns among communities due to their ever-increasing realism.
Current methods for detecting forged videos rely mainly on global frame features.
We propose a neural Deepfake detector that focuses on the localized manipulative signatures of the forged videos.
arXiv Detail & Related papers (2025-02-12T08:51:33Z) - Vulnerability-Aware Spatio-Temporal Learning for Generalizable and Interpretable Deepfake Video Detection [14.586314545834934]
Deepfake videos are highly challenging to detect due to the complex intertwined temporal and spatial artifacts in forged sequences.
Most recent approaches rely on binary classifiers trained on both real and fake data.
We introduce a multi-task learning framework with additional spatial and temporal branches that enable the model to focus on subtle artifacts.
Second, we propose a video-level data algorithm that generates pseudo-fake videos with subtle artifacts, providing the model with high-quality samples and ground truth data.
arXiv Detail & Related papers (2025-01-02T10:21:34Z) - Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts [57.01985221057047]
This paper introduces a novel method that learnstemporal prompt embeddings for weakly supervised video anomaly detection and localization (WSVADL) based on pre-trained vision-language models (VLMs)
Our method achieves state-of-theart performance on three public benchmarks for the WSVADL task.
arXiv Detail & Related papers (2024-08-12T03:31:29Z) - Exploring Spatial-Temporal Features for Deepfake Detection and
Localization [0.0]
We propose a Deepfake network that simultaneously explores spatial and temporal features for detecting and localizing forged regions.
Specifically, we design a new Anchor-Mesh Motion (AMM) algorithm to extract temporal (motion) features by modeling the precise geometric movements of the facial micro-expression.
The superiority of our ST-DDL network is verified by experimental comparisons with several state-of-the-art competitors.
arXiv Detail & Related papers (2022-10-28T03:38:49Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z) - Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions.
Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods.
We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Deepfake Detection using Spatiotemporal Convolutional Networks [0.0]
Deepfake detection methods only use individual frames and therefore fail to learn from temporal information.
We created a benchmark of performance using Celeb-DF dataset.
Our methods outperformed state-of-theart frame-based detection methods.
arXiv Detail & Related papers (2020-06-26T01:32:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.