Related papers: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection

URL: http://arxiv.org/abs/2512.04175v1
Date: Wed, 03 Dec 2025 19:00:07 GMT
Title: Beyond Flicker: Detecting Kinematic Inconsistencies for Generalizable Deepfake Video Detection
Authors: Alejandro Cobo, Roberto Valle, José Miguel Buenaposada, Luis Baumela,
Abstract summary: Generalizing deepfake detection to unseen manipulations remains a key challenge.<n>Recent approach is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues.<n>We propose a synthetic video generation method that creates training data with subtle inconsistencies.
Score: 41.44337153700898
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues. While effective for static images, extending this to the video domain is an open issue. Existing methods model temporal artifacts as frame-to-frame instabilities, overlooking a key vulnerability: the violation of natural motion dependencies between different facial regions. In this paper, we propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.

Related papers

Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection [14.586314545834934]
We propose a fine-grained deepfake video detection approach called FakeSTormer.<n>Specifically, we introduce a multi-task learning framework that incorporates two auxiliary branches for explicitly attending artifact-prone spatial and temporal regions.<n>We also propose a video-level synthesis strategy that generates pseudo-fake videos with subtle-temporal artifacts.
arXiv Detail & Related papers (2025-01-02T10:21:34Z)
Learning Expressive And Generalizable Motion Features For Face Forgery Detection [52.54404879581527]
We propose an effective sequence-based forgery detection framework based on an existing video classification method. To make the motion features more expressive for manipulation detection, we propose an alternative motion consistency block. We make a general video classification network achieve promising results on three popular face forgery datasets.
arXiv Detail & Related papers (2024-03-08T09:25:48Z)
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection [33.29744034340998]
Mover is a new Deepfake detection model that exploits unspecific facial part inconsistencies. We propose a novel model with dual networks that utilize the pretrained encoder and masked autoencoder. Our experiments on standard benchmarks demonstrate that Mover is highly effective.
arXiv Detail & Related papers (2023-03-03T06:57:22Z)
Face Forgery Detection Based on Facial Region Displacement Trajectory Series [10.338298543908339]
We develop a method for detecting manipulated videos based on the trajectory of the facial region displacement. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos.
arXiv Detail & Related papers (2022-12-07T14:47:54Z)
Multimodal Graph Learning for Deepfake Detection [10.077496841634135]
Existing deepfake detectors face several challenges in achieving robustness and generalization. We propose a novel framework, namely Multimodal Graph Learning (MGL), that leverages information from multiple modalities. Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection.
arXiv Detail & Related papers (2022-09-12T17:17:49Z)
Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption [94.5031244215761]
We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos. Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator. Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
arXiv Detail & Related papers (2022-07-21T10:42:34Z)
Deepfake Video Detection with Spatiotemporal Dropout Transformer [32.577096083927884]
This paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via a dropout transformer. The approach reorganizes each input video into bag of patches that is then fed into a vision transformer to achieve robust representation.
arXiv Detail & Related papers (2022-07-14T02:04:42Z)
Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion. Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos. We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image. Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z)
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z)
Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes. We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others. Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.