Self-supervised Transformer for Deepfake Detection
- URL: http://arxiv.org/abs/2203.01265v1
- Date: Wed, 2 Mar 2022 17:44:40 GMT
- Title: Self-supervised Transformer for Deepfake Detection
- Authors: Hanqing Zhao, Wenbo Zhou, Dongdong Chen, Weiming Zhang and Nenghai Yu
- Abstract summary: Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
- Score: 112.81127845409002
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The fast evolution and widespread of deepfake techniques in real-world
scenarios require stronger generalization abilities of face forgery detectors.
Some works capture the features that are unrelated to method-specific
artifacts, such as clues of blending boundary, accumulated up-sampling, to
strengthen the generalization ability. However, the effectiveness of these
methods can be easily corrupted by post-processing operations such as
compression. Inspired by transfer learning, neural networks pre-trained on
other large-scale face-related tasks may provide useful features for deepfake
detection. For example, lip movement has been proved to be a kind of robust and
good-transferring highlevel semantic feature, which can be learned from the
lipreading task. However, the existing method pre-trains the lip feature
extraction model in a supervised manner, which requires plenty of human
resources in data annotation and increases the difficulty of obtaining training
data. In this paper, we propose a self-supervised transformer based
audio-visual contrastive learning method. The proposed method learns mouth
motion representations by encouraging the paired video and audio
representations to be close while unpaired ones to be diverse. After
pre-training with our method, the model will then be partially fine-tuned for
deepfake detection task. Extensive experiments show that our self-supervised
method performs comparably or even better than the supervised pre-training
counterpart.
Related papers
- Towards General Deepfake Detection with Dynamic Curriculum [4.622705420257596]
We propose to introduce the sample hardness into the training of deepfake detectors via the curriculum learning paradigm.
We present a novel simple yet effective strategy, named Dynamic Facial Forensic Curriculum (DFFC), which makes the model gradually focus on hard samples during the training.
Comprehensive experiments show that DFFC can improve both within- and cross-dataset performance of various kinds of end-to-end deepfake detectors.
arXiv Detail & Related papers (2024-10-15T00:58:09Z) - Semantics-Oriented Multitask Learning for DeepFake Detection: A Joint Embedding Approach [77.65459419417533]
We propose an automatic dataset expansion technique to support semantics-oriented DeepFake detection tasks.
We also resort to joint embedding of face images and their corresponding labels for prediction.
Our method improves the generalizability of DeepFake detection and renders some degree of model interpretation by providing human-understandable explanations.
arXiv Detail & Related papers (2024-08-29T07:11:50Z) - UniForensics: Face Forgery Detection via General Facial Representation [60.5421627990707]
High-level semantic features are less susceptible to perturbations and not limited to forgery-specific artifacts, thus having stronger generalization.
We introduce UniForensics, a novel deepfake detection framework that leverages a transformer-based video network, with a meta-functional face classification for enriched facial representation.
arXiv Detail & Related papers (2024-07-26T20:51:54Z) - Training-Free Deepfake Voice Recognition by Leveraging Large-Scale Pre-Trained Models [52.04189118767758]
Generalization is a main issue for current audio deepfake detectors.
In this paper we study the potential of large-scale pre-trained models for audio deepfake detection.
arXiv Detail & Related papers (2024-05-03T15:27:11Z) - Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning [0.0]
Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods.
Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs.
We introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms.
arXiv Detail & Related papers (2024-02-06T11:35:05Z) - FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video
Deepfake Detection [10.36919027402249]
Synthetic videos of speaking humans can be used to spread misinformation in a convincing manner.
FakeOut is a novel approach that relies on multi-modal data throughout both the pre-training phase and the adaption phase.
Our method achieves state-of-the-art results in cross-dataset generalization on audio-visual datasets.
arXiv Detail & Related papers (2022-12-01T18:56:31Z) - DeepfakeUCL: Deepfake Detection via Unsupervised Contrastive Learning [20.94569893388119]
We design a novel deepfake detection method via unsupervised contrastive learning.
We show that our method enables comparable detection performance to state-of-the-art supervised techniques.
arXiv Detail & Related papers (2021-04-23T09:48:10Z) - Towards Generalizable and Robust Face Manipulation Detection via
Bag-of-local-feature [55.47546606878931]
We propose a novel method for face manipulation detection, which can improve the generalization ability and robustness by bag-of-local-feature.
Specifically, we extend Transformers using bag-of-feature approach to encode inter-patch relationships, allowing it to learn local forgery features without any explicit supervision.
arXiv Detail & Related papers (2021-03-14T12:50:48Z) - Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery
Detection [118.37239586697139]
LipForensics is a detection approach capable of both generalising manipulations and withstanding various distortions.
It consists in first pretraining a-temporal network to perform visual speech recognition (lipreading)
A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without over-fitting to low-level, manipulation-specific artefacts.
arXiv Detail & Related papers (2020-12-14T15:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.