Deepfake Video Detection Using Generative Convolutional Vision
Transformer
- URL: http://arxiv.org/abs/2307.07036v1
- Date: Thu, 13 Jul 2023 19:27:40 GMT
- Title: Deepfake Video Detection Using Generative Convolutional Vision
Transformer
- Authors: Deressa Wodajo, Solomon Atnafu, Zahid Akhtar
- Abstract summary: We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection.
Our model combines ConvNeXt and Swin Transformer models for feature extraction.
By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
- Score: 3.8297637120486496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deepfakes have raised significant concerns due to their potential to spread
false information and compromise digital media integrity. In this work, we
propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake
video detection. Our model combines ConvNeXt and Swin Transformer models for
feature extraction, and it utilizes Autoencoder and Variational Autoencoder to
learn from the latent data distribution. By learning from the visual artifacts
and latent data distribution, GenConViT achieves improved performance in
detecting a wide range of deepfake videos. The model is trained and evaluated
on DFDC, FF++, DeepfakeTIMIT, and Celeb-DF v2 datasets, achieving high
classification accuracy, F1 scores, and AUC values. The proposed GenConViT
model demonstrates robust performance in deepfake video detection, with an
average accuracy of 95.8% and an AUC value of 99.3% across the tested datasets.
Our proposed model addresses the challenge of generalizability in deepfake
detection by leveraging visual and latent features and providing an effective
solution for identifying a wide range of fake videos while preserving media
integrity. The code for GenConViT is available at
https://github.com/erprogs/GenConViT.
Related papers
- CapST: An Enhanced and Lightweight Model Attribution Approach for
Synthetic Videos [9.209808258321559]
This paper investigates the model attribution problem of Deepfake videos from a recently proposed dataset, Deepfakes from Different Models (DFDM)
The dataset comprises 6,450 Deepfake videos generated by five distinct models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio.
Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the efficacy of our proposed method, achieving up to a 4% improvement in accurately categorizing deepfake videos.
arXiv Detail & Related papers (2023-11-07T08:05:09Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally.
Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy.
The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z) - Hybrid Transformer Network for Deepfake Detection [2.644723682054489]
We propose a novel hybrid transformer network utilizing early feature fusion strategy for deepfake video detection.
Our model achieves comparable results to other more advanced state-of-the-art approaches when evaluated on FaceForensics++ and DFDC benchmarks.
We also propose novel face cut-out augmentations, as well as random cut-out augmentations.
arXiv Detail & Related papers (2022-08-11T13:30:42Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - Model Attribution of Face-swap Deepfake Videos [39.771800841412414]
We first introduce a new dataset with DeepFakes from Different Models (DFDM) based on several Autoencoder models.
Specifically, five generation models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio have been used to generate a total of 6,450 Deepfake videos.
We take Deepfakes model attribution as a multiclass classification task and propose a spatial and temporal attention based method to explore the differences among Deepfakes.
arXiv Detail & Related papers (2022-02-25T20:05:18Z) - Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes.
We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection.
We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Deepfake Detection Scheme Based on Vision Transformer and Distillation [4.716110829725784]
We propose a Vision Transformer model with distillation methodology for detecting fake videos.
We verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features.
arXiv Detail & Related papers (2021-04-03T09:13:05Z) - Deepfake Video Detection Using Convolutional Vision Transformer [0.0]
Deep learning techniques can generate and synthesis hyper-realistic videos known as Deepfakes.
Deepfakes pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam.
We propose a Convolutional Vision Transformer for the detection of Deepfakes.
arXiv Detail & Related papers (2021-02-22T15:56:05Z) - Adversarially robust deepfake media detection using fused convolutional
neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data.
We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos.
The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.