Related papers: FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations

URL: http://arxiv.org/abs/2111.07601v1
Date: Mon, 15 Nov 2021 08:44:52 GMT
Title: FakeTransformer: Exposing Face Forgery From Spatial-Temporal Representation Modeled By Facial Pixel Variations
Authors: Yuyang Sun, Zhiyong Zhang, Changzhen Qiu, Liang Wang and Zekai Wang
Abstract summary: Face forgery can attack any target, which poses a new threat to personal privacy and property security. Inspired by the fact that the spatial coherence and temporal consistency of physiological signal are destroyed in the generated content, we attempt to find inconsistent patterns that can distinguish between real videos and synthetic videos.
Score: 8.194624568473126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid development of generation model, AI-based face manipulation technology, which called DeepFakes, has become more and more realistic. This means of face forgery can attack any target, which poses a new threat to personal privacy and property security. Moreover, the misuse of synthetic video shows potential dangers in many areas, such as identity harassment, pornography and news rumors. Inspired by the fact that the spatial coherence and temporal consistency of physiological signal are destroyed in the generated content, we attempt to find inconsistent patterns that can distinguish between real videos and synthetic videos from the variations of facial pixels, which are highly related to physiological information. Our approach first applies Eulerian Video Magnification (EVM) at multiple Gaussian scales to the original video to enlarge the physiological variations caused by the change of facial blood volume, and then transform the original video and magnified videos into a Multi-Scale Eulerian Magnified Spatial-Temporal map (MEMSTmap), which can represent time-varying physiological enhancement sequences on different octaves. Then, these maps are reshaped into frame patches in column units and sent to the vision Transformer to learn the spatio-time descriptors of frame levels. Finally, we sort out the feature embedding and output the probability of judging whether the video is real or fake. We validate our method on the FaceForensics++ and DeepFake Detection datasets. The results show that our model achieves excellent performance in forgery detection, and also show outstanding generalization capability in cross-data domain.

Related papers

OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration. We propose OSDFace, a novel one-step diffusion model for face restoration. Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z)
Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud. In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video. We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing [23.325272595629773]
In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality. We propose the Graph Guided Video Vision Transformer, which combines faces with facial landmarks for photometric and dynamic feature fusion.
arXiv Detail & Related papers (2024-08-14T17:22:41Z)
Do As I Do: Pose Guided Human Motion Copy [39.40271266234068]
Motion copy is an intriguing yet challenging task in artificial intelligence and computer vision. Existing approaches typically adopt a conventional GAN with an L1 or L2 loss to produce the target fake video. We present an episodic memory module in the pose-to-appearance generation to propel continuous learning. Our method significantly outperforms state-of-the-art approaches and gains 7.2% and 12.4% improvements in PSNR and FID respectively.
arXiv Detail & Related papers (2024-06-24T12:41:51Z)
Deepfake detection by exploiting surface anomalies: the SurFake approach [29.088218634944116]
This paper investigates how deepfake creation can impact on the characteristics that the whole scene had at the time of the acquisition. By resorting to the analysis of the characteristics of the surfaces depicted in the image it is possible to obtain a descriptor usable to train a CNN for deepfake detection.
arXiv Detail & Related papers (2023-10-31T16:54:14Z)
Face Forgery Detection Based on Facial Region Displacement Trajectory Series [10.338298543908339]
We develop a method for detecting manipulated videos based on the trajectory of the facial region displacement. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos.
arXiv Detail & Related papers (2022-12-07T14:47:54Z)
Image-to-Video Generation via 3D Facial Dynamics [78.01476554323179]
We present a versatile model, FaceAnime, for various video generation tasks from still images. Our model is versatile for various AR/VR and entertainment applications, such as face video and face video prediction.
arXiv Detail & Related papers (2021-05-31T02:30:11Z)
Sharp Multiple Instance Learning for DeepFake Video Detection [54.12548421282696]
We introduce a new problem of partial face attack in DeepFake video, where only video-level labels are provided but not all the faces in the fake videos are manipulated. A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction. Experiments on FFPMS and widely used DFDC dataset verify that S-MIL is superior to other counterparts for partially attacked DeepFake video detection.
arXiv Detail & Related papers (2020-08-11T08:52:17Z)
Dynamic texture analysis for detecting fake faces in video sequences [6.1356022122903235]
This work explores the analysis of texture-temporal dynamics of the video signal. The goal is to characterizing and distinguishing real fake sequences. We propose to build multiple binary decision on the joint analysis of temporal segments.
arXiv Detail & Related papers (2020-07-30T07:21:24Z)
Over-the-Air Adversarial Flickering Attacks against Video Recognition Networks [54.82488484053263]
Deep neural networks for video classification may be subjected to adversarial manipulation. We present a manipulation scheme for fooling video classifiers by introducing a flickering temporal perturbation. The attack was implemented on several target models and the transferability of the attack was demonstrated.
arXiv Detail & Related papers (2020-02-12T17:58:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.