Deepfake Detection via Joint Unsupervised Reconstruction and Supervised
Classification
- URL: http://arxiv.org/abs/2211.13424v1
- Date: Thu, 24 Nov 2022 05:44:26 GMT
- Title: Deepfake Detection via Joint Unsupervised Reconstruction and Supervised
Classification
- Authors: Bosheng Yan, Xuequan Lu, Chang-Tsun Li
- Abstract summary: We introduce a novel approach for deepfake detection, which considers the reconstruction and classification tasks simultaneously.
This method shares the information learned by one task with the other, which focuses on a different aspect other existing works rarely consider.
Our method achieves state-of-the-art performance on three commonly-used datasets.
- Score: 25.84902508816679
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning has enabled realistic face manipulation (i.e., deepfake), which
poses significant concerns over the integrity of the media in circulation. Most
existing deep learning techniques for deepfake detection can achieve promising
performance in the intra-dataset evaluation setting (i.e., training and testing
on the same dataset), but are unable to perform satisfactorily in the
inter-dataset evaluation setting (i.e., training on one dataset and testing on
another). Most of the previous methods use the backbone network to extract
global features for making predictions and only employ binary supervision
(i.e., indicating whether the training instances are fake or authentic) to
train the network. Classification merely based on the learning of global
features leads often leads to weak generalizability to unseen manipulation
methods. In addition, the reconstruction task can improve the learned
representations. In this paper, we introduce a novel approach for deepfake
detection, which considers the reconstruction and classification tasks
simultaneously to address these problems. This method shares the information
learned by one task with the other, which focuses on a different aspect other
existing works rarely consider and hence boosts the overall performance. In
particular, we design a two-branch Convolutional AutoEncoder (CAE), in which
the Convolutional Encoder used to compress the feature map into the latent
representation is shared by both branches. Then the latent representation of
the input data is fed to a simple classifier and the unsupervised
reconstruction component simultaneously. Our network is trained end-to-end.
Experiments demonstrate that our method achieves state-of-the-art performance
on three commonly-used datasets, particularly in the cross-dataset evaluation
setting.
Related papers
- Enhancing Hyperspectral Image Prediction with Contrastive Learning in Low-Label Regime [0.810304644344495]
Self-supervised contrastive learning is an effective approach for addressing the challenge of limited labelled data.
We evaluate the method's performance for both the single-label and multi-label classification tasks.
arXiv Detail & Related papers (2024-10-10T10:20:16Z) - A Study of Forward-Forward Algorithm for Self-Supervised Learning [65.268245109828]
We study the performance of forward-forward vs. backpropagation for self-supervised representation learning.
Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-supervised) training, the transfer performance is significantly lagging behind in all the studied settings.
arXiv Detail & Related papers (2023-09-21T10:14:53Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Self-Supervised Visual Place Recognition by Mining Temporal and Feature
Neighborhoods [17.852415436033436]
We propose a novel framework named textitTF-VPR that uses temporal neighborhoods and learnable feature neighborhoods to discover unknown spatial neighborhoods.
Our method follows an iterative training paradigm which alternates between: (1) representation learning with data augmentation, (2) positive set expansion to include the current feature space neighbors, and (3) positive set contraction via geometric verification.
arXiv Detail & Related papers (2022-08-19T12:59:46Z) - PARTICUL: Part Identification with Confidence measure using Unsupervised
Learning [0.0]
PARTICUL is a novel algorithm for unsupervised learning of part detectors from datasets used in fine-grained recognition.
It exploits the macro-similarities of all images in the training set in order to mine for recurring patterns in the feature space of a pre-trained convolutional neural network.
We show that our detectors can consistently highlight parts of the object while providing a good measure of the confidence in their prediction.
arXiv Detail & Related papers (2022-06-27T13:44:49Z) - Self-supervised Transformer for Deepfake Detection [112.81127845409002]
Deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors.
Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection.
In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method.
arXiv Detail & Related papers (2022-03-02T17:44:40Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Adaptive Prototypical Networks with Label Words and Joint Representation
Learning for Few-Shot Relation Classification [17.237331828747006]
This work focuses on few-shot relation classification (FSRC)
We propose an adaptive mixture mechanism to add label words to the representation of the class prototype.
Experiments have been conducted on FewRel under different few-shot (FS) settings.
arXiv Detail & Related papers (2021-01-10T11:25:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.