Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption
- URL: http://arxiv.org/abs/2207.10402v2
- Date: Sun, 25 Jun 2023 13:26:20 GMT
- Title: Detecting Deepfake by Creating Spatio-Temporal Regularity Disruption
- Authors: Jiazhi Guan, Hang Zhou, Mingming Gong, Errui Ding, Jingdong Wang,
Youjian Zhao
- Abstract summary: We propose to boost the generalization of deepfake detection by distinguishing the "regularity disruption" that does not appear in real videos.
Specifically, by carefully examining the spatial and temporal properties, we propose to disrupt a real video through a Pseudo-fake Generator.
Such practice allows us to achieve deepfake detection without using fake videos and improves the generalization ability in a simple and efficient manner.
- Score: 94.5031244215761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite encouraging progress in deepfake detection, generalization to unseen
forgery types remains a significant challenge due to the limited forgery clues
explored during training. In contrast, we notice a common phenomenon in
deepfake: fake video creation inevitably disrupts the statistical regularity in
original videos. Inspired by this observation, we propose to boost the
generalization of deepfake detection by distinguishing the "regularity
disruption" that does not appear in real videos. Specifically, by carefully
examining the spatial and temporal properties, we propose to disrupt a real
video through a Pseudo-fake Generator and create a wide range of pseudo-fake
videos for training. Such practice allows us to achieve deepfake detection
without using fake videos and improves the generalization ability in a simple
and efficient manner. To jointly capture the spatial and temporal disruptions,
we propose a Spatio-Temporal Enhancement block to learn the regularity
disruption across space and time on our self-created videos. Through
comprehensive experiments, our method exhibits excellent performance on several
datasets.
Related papers
- Learning Natural Consistency Representation for Face Forgery Video Detection [23.53549629885891]
We propose to learn the Natural representation (NACO) real face videos in a self-supervised manner.
Our method outperforms other state-of-the-art methods with impressive robustness.
arXiv Detail & Related papers (2024-07-15T09:00:02Z) - Compressed Deepfake Video Detection Based on 3D Spatiotemporal Trajectories [10.913345858983275]
Deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals.
In this paper, we propose a deepfake video detection method based on 3Dtemporal motion features.
Our method yields satisfactory results and showcases its potential for practical applications.
arXiv Detail & Related papers (2024-04-28T11:48:13Z) - Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies [14.660707087391463]
Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy.
We propose a novel unsupervised method for detecting deepfake videos by directly identifying intra-modal and cross-modal inconsistency between video segments.
Our proposed method outperforms prior state-of-the-art unsupervised deepfake detection methods on the challenging FakeAVCeleb dataset.
arXiv Detail & Related papers (2023-11-28T03:28:19Z) - Undercover Deepfakes: Detecting Fake Segments in Videos [1.2609216345578933]
deepfake generation is a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth.
In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels.
In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes.
arXiv Detail & Related papers (2023-05-11T04:43:10Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Deepfake Video Detection with Spatiotemporal Dropout Transformer [32.577096083927884]
This paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via a dropout transformer.
The approach reorganizes each input video into bag of patches that is then fed into a vision transformer to achieve robust representation.
arXiv Detail & Related papers (2022-07-14T02:04:42Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
Detection [112.96004727646115]
We develop a method to detect face-manipulated videos using real talking faces.
We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments.
Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
arXiv Detail & Related papers (2022-01-18T17:14:54Z) - Watch Those Words: Video Falsification Detection Using Word-Conditioned
Facial Motion [82.06128362686445]
We propose a multi-modal semantic forensic approach to handle both cheapfakes and visually persuasive deepfakes.
We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others.
Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation.
arXiv Detail & Related papers (2021-12-21T01:57:04Z) - Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery
Detection [118.37239586697139]
LipForensics is a detection approach capable of both generalising manipulations and withstanding various distortions.
It consists in first pretraining a-temporal network to perform visual speech recognition (lipreading)
A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without over-fitting to low-level, manipulation-specific artefacts.
arXiv Detail & Related papers (2020-12-14T15:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.