Generalizable Deepfake Detection with Phase-Based Motion Analysis
- URL: http://arxiv.org/abs/2211.09363v1
- Date: Thu, 17 Nov 2022 06:28:01 GMT
- Title: Generalizable Deepfake Detection with Phase-Based Motion Analysis
- Authors: Ekta Prashnani, Michael Goebel, B. S. Manjunath
- Abstract summary: We propose PhaseForensics, a DeepFake (DF) video detection method that leverages a phase-based motion representation of temporal dynamics.
We show improved distortion and adversarial robustness, and state-of-the-art cross-dataset generalization, with 91.2% video-level AUC on the challenging CelebDFv2.
- Score: 11.042856247812969
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose PhaseForensics, a DeepFake (DF) video detection method that
leverages a phase-based motion representation of facial temporal dynamics.
Existing methods relying on temporal inconsistencies for DF detection present
many advantages over the typical frame-based methods. However, they still show
limited cross-dataset generalization and robustness to common distortions.
These shortcomings are partially due to error-prone motion estimation and
landmark tracking, or the susceptibility of the pixel intensity-based features
to spatial distortions and the cross-dataset domain shifts. Our key insight to
overcome these issues is to leverage the temporal phase variations in the
band-pass components of the Complex Steerable Pyramid on face sub-regions. This
not only enables a robust estimate of the temporal dynamics in these regions,
but is also less prone to cross-dataset variations. Furthermore, the band-pass
filters used to compute the local per-frame phase form an effective defense
against the perturbations commonly seen in gradient-based adversarial attacks.
Overall, with PhaseForensics, we show improved distortion and adversarial
robustness, and state-of-the-art cross-dataset generalization, with 91.2%
video-level AUC on the challenging CelebDFv2 (a recent state-of-the-art
compares at 86.9%).
Related papers
- Frame-level Temporal Difference Learning for Partial Deepfake Speech Detection [16.923285534924116]
We propose a Temporal Difference Attention Module (TDAM) that redefines partial deepfake detection as identifying unnatural temporal variations.<n>A dual-level hierarchical difference representation captures temporal irregularities at both fine and coarse scales, while adaptive average pooling preserves essential patterns across variable-length inputs to minimize information loss.<n>Our TDAM-AvgPool model achieves state-of-the-art performance, with an EER of 0.59% on the PartialSpoof dataset and 0.03% on the HAD dataset, which significantly outperforms the existing methods without requiring frame-level supervision.
arXiv Detail & Related papers (2025-07-20T19:46:23Z) - CAST: Cross-Attentive Spatio-Temporal feature fusion for Deepfake detection [0.0]
CNNs are effective at capturing spatial artifacts, and Transformers excel at modeling temporal inconsistencies.<n>We propose a unified CAST model that leverages cross-attention to effectively fuse spatial and temporal features.<n>We evaluate the performance of our model using the FaceForensics++, Celeb-DF, and DeepfakeDetection datasets.
arXiv Detail & Related papers (2025-06-26T18:51:17Z) - Seurat: From Moving Points to Depth [66.65189052568209]
We propose a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories.
Our approach achieves temporally smooth, high-accuracy depth predictions across diverse domains.
arXiv Detail & Related papers (2025-04-20T17:37:02Z) - From Pixels to Trajectory: Universal Adversarial Example Detection via Temporal Imprints [21.454396392842426]
We unveil discernible temporal (or historical) trajectory imprints resulting from adversarial example (AE) attacks.
We propose TRAIT (TRaceable Adrial temporal trajectory ImprinTs) for AE detection.
TRAIT achieves an AE detection accuracy exceeding 97%, often around 99%, while maintaining a false rejection rate of 1%.
arXiv Detail & Related papers (2025-03-06T06:00:04Z) - DIP: Diffusion Learning of Inconsistency Pattern for General DeepFake Detection [18.116004258266535]
A transformer-based framework for Diffusion Inconsistency Learning (DIP) is proposed, which exploits directional inconsistencies for deepfake video detection.
Our method could effectively identify forgery clues and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-10-31T06:26:00Z) - Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection [0.0]
Wavelet-CLIP is a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion.
Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes.
arXiv Detail & Related papers (2024-09-26T21:16:51Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach [49.995833831087175]
This work proposes a novel method for generating generic Video-temporal PAs by inpainting a masked out region of an image.
In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting.
Our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting.
arXiv Detail & Related papers (2023-11-27T13:14:06Z) - Towards Generalizable Deepfake Detection by Primary Region
Regularization [52.41801719896089]
This paper enhances the generalization capability from a novel regularization perspective.
Our method consists of two stages, namely the static localization for primary region maps, and the dynamic exploitation of primary region masks.
We conduct extensive experiments over three widely used deepfake datasets - DFDC, DF-1.0, and Celeb-DF with five backbones.
arXiv Detail & Related papers (2023-07-24T05:43:34Z) - On Robust Cross-View Consistency in Self-Supervised Monocular Depth Estimation [56.97699793236174]
We study two kinds of robust cross-view consistency in this paper.
We exploit the temporal coherence in both depth feature space and 3D voxel space for self-supervised monocular depth estimation.
Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques.
arXiv Detail & Related papers (2022-09-19T03:46:13Z) - Delving into Sequential Patches for Deepfake Detection [64.19468088546743]
Recent advances in face forgery techniques produce nearly untraceable deepfake videos, which could be leveraged with malicious intentions.
Previous studies has identified the importance of local low-level cues and temporal information in pursuit to generalize well across deepfake methods.
We propose the Local- & Temporal-aware Transformer-based Deepfake Detection framework, which adopts a local-to-global learning protocol.
arXiv Detail & Related papers (2022-07-06T16:46:30Z) - MC-LCR: Multi-modal contrastive classification by locally correlated
representations for effective face forgery detection [11.124150983521158]
We propose a novel framework named Multi-modal Contrastive Classification by Locally Correlated Representations.
Our MC-LCR aims to amplify implicit local discrepancies between authentic and forged faces from both spatial and frequency domains.
We achieve state-of-the-art performance and demonstrate the robustness and generalization of our method.
arXiv Detail & Related papers (2021-10-07T09:24:12Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.