Deepfake Video Detection Using Convolutional Vision Transformer
- URL: http://arxiv.org/abs/2102.11126v1
- Date: Mon, 22 Feb 2021 15:56:05 GMT
- Title: Deepfake Video Detection Using Convolutional Vision Transformer
- Authors: Deressa Wodajo, Solomon Atnafu
- Abstract summary: Deep learning techniques can generate and synthesis hyper-realistic videos known as Deepfakes.
Deepfakes pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam.
We propose a Convolutional Vision Transformer for the detection of Deepfakes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid advancement of deep learning models that can generate and synthesis
hyper-realistic videos known as Deepfakes and their ease of access to the
general public have raised concern from all concerned bodies to their possible
malicious intent use. Deep learning techniques can now generate faces, swap
faces between two subjects in a video, alter facial expressions, change gender,
and alter facial features, to list a few. These powerful video manipulation
methods have potential use in many fields. However, they also pose a looming
threat to everyone if used for harmful purposes such as identity theft,
phishing, and scam. In this work, we propose a Convolutional Vision Transformer
for the detection of Deepfakes. The Convolutional Vision Transformer has two
components: Convolutional Neural Network (CNN) and Vision Transformer (ViT).
The CNN extracts learnable features while the ViT takes in the learned features
as input and categorizes them using an attention mechanism. We trained our
model on the DeepFake Detection Challenge Dataset (DFDC) and have achieved 91.5
percent accuracy, an AUC value of 0.91, and a loss value of 0.32. Our
contribution is that we have added a CNN module to the ViT architecture and
have achieved a competitive result on the DFDC dataset.
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - Deepfake Video Detection Using Generative Convolutional Vision
Transformer [3.8297637120486496]
We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection.
Our model combines ConvNeXt and Swin Transformer models for feature extraction.
By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
arXiv Detail & Related papers (2023-07-13T19:27:40Z) - Hybrid Transformer Network for Deepfake Detection [2.644723682054489]
We propose a novel hybrid transformer network utilizing early feature fusion strategy for deepfake video detection.
Our model achieves comparable results to other more advanced state-of-the-art approaches when evaluated on FaceForensics++ and DFDC benchmarks.
We also propose novel face cut-out augmentations, as well as random cut-out augmentations.
arXiv Detail & Related papers (2022-08-11T13:30:42Z) - Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis [60.13902294276283]
We present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated).
Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face.
Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham.
arXiv Detail & Related papers (2022-07-26T17:39:04Z) - Copy Motion From One to Another: Fake Motion Video Generation [53.676020148034034]
A compelling application of artificial intelligence is to generate a video of a target person performing arbitrary desired motion.
Current methods typically employ GANs with a L2 loss to assess the authenticity of the generated videos.
We propose a theoretically motivated Gromov-Wasserstein loss that facilitates learning the mapping from a pose to a foreground image.
Our method is able to generate realistic target person videos, faithfully copying complex motions from a source person.
arXiv Detail & Related papers (2022-05-03T08:45:22Z) - Audio-Visual Person-of-Interest DeepFake Detection [77.04789677645682]
The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world.
We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity.
Our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos.
arXiv Detail & Related papers (2022-04-06T20:51:40Z) - Video Transformer for Deepfake Detection with Incremental Learning [11.586926513803077]
Face forgery by deepfake is widely spread over the internet and this raises severe societal concerns.
We propose a novel video transformer with incremental learning for detecting deepfake videos.
arXiv Detail & Related papers (2021-08-11T16:22:56Z) - Combining EfficientNet and Vision Transformers for Video Deepfake
Detection [6.365889364810238]
Deepfakes are the result of digital manipulation to obtain credible videos in order to deceive the viewer.
In this study, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor.
The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC)
arXiv Detail & Related papers (2021-07-06T13:35:11Z) - Deepfake Detection Scheme Based on Vision Transformer and Distillation [4.716110829725784]
We propose a Vision Transformer model with distillation methodology for detecting fake videos.
We verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features.
arXiv Detail & Related papers (2021-04-03T09:13:05Z) - Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting.
We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains.
The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z) - Adversarially robust deepfake media detection using fused convolutional
neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data.
We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos.
The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.