FakeTransformer: Exposing Face Forgery From Spatial-Temporal
Representation Modeled By Facial Pixel Variations
- URL: http://arxiv.org/abs/2111.07601v1
- Date: Mon, 15 Nov 2021 08:44:52 GMT
- Title: FakeTransformer: Exposing Face Forgery From Spatial-Temporal
Representation Modeled By Facial Pixel Variations
- Authors: Yuyang Sun, Zhiyong Zhang, Changzhen Qiu, Liang Wang and Zekai Wang
- Abstract summary: Face forgery can attack any target, which poses a new threat to personal privacy and property security.
Inspired by the fact that the spatial coherence and temporal consistency of physiological signal are destroyed in the generated content, we attempt to find inconsistent patterns that can distinguish between real videos and synthetic videos.
- Score: 8.194624568473126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of generation model, AI-based face manipulation
technology, which called DeepFakes, has become more and more realistic. This
means of face forgery can attack any target, which poses a new threat to
personal privacy and property security. Moreover, the misuse of synthetic video
shows potential dangers in many areas, such as identity harassment, pornography
and news rumors. Inspired by the fact that the spatial coherence and temporal
consistency of physiological signal are destroyed in the generated content, we
attempt to find inconsistent patterns that can distinguish between real videos
and synthetic videos from the variations of facial pixels, which are highly
related to physiological information. Our approach first applies Eulerian Video
Magnification (EVM) at multiple Gaussian scales to the original video to
enlarge the physiological variations caused by the change of facial blood
volume, and then transform the original video and magnified videos into a
Multi-Scale Eulerian Magnified Spatial-Temporal map (MEMSTmap), which can
represent time-varying physiological enhancement sequences on different
octaves. Then, these maps are reshaped into frame patches in column units and
sent to the vision Transformer to learn the spatio-time descriptors of frame
levels. Finally, we sort out the feature embedding and output the probability
of judging whether the video is real or fake. We validate our method on the
FaceForensics++ and DeepFake Detection datasets. The results show that our
model achieves excellent performance in forgery detection, and also show
outstanding generalization capability in cross-data domain.
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing [23.325272595629773]
In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality.
We propose the Graph Guided Video Vision Transformer, which combines faces with facial landmarks for photometric and dynamic feature fusion.
arXiv Detail & Related papers (2024-08-14T17:22:41Z) - Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision.
Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video.
We present an episodic memory module in the pose-to-appearance generation to propel continuous learning.
Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z) - VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Deepfake detection by exploiting surface anomalies: the SurFake approach [29.088218634944116]
This paper investigates how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition.
By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection.
arXiv Detail & Related papers (2023-10-31T16:54:14Z) - Face Forgery Detection Based on Facial Region Displacement Trajectory
Series [10.338298543908339]
We develop a method for detecting manipulated videos based on the trajectory of the facial region displacement.
This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos.
arXiv Detail & Related papers (2022-12-07T14:47:54Z) - Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images.
Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z) - Sharp Multiple Instance Learning for DeepFake Video Detection [54.12548421282696]
We introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated.
A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction.
Experiments on FFPMS and widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection.
arXiv Detail & Related papers (2020-08-11T08:52:17Z) - Dynamic texture analysis for detecting fake faces in video sequences [6.1356022122903235]
This work explores the analysis of texture-temporal dynamics of the video signal.
The goal is to characterizing and distinguishing real fake sequences.
We propose to build multiple binary decision on the joint analysis of temporal segments.
arXiv Detail & Related papers (2020-07-30T07:21:24Z) - Over-the-Air Adversarial Flickering Attacks against Video Recognition
Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation.
We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation.
The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.