Combining EfficientNet and Vision Transformers for Video Deepfake
Detection
- URL: http://arxiv.org/abs/2107.02612v1
- Date: Tue, 6 Jul 2021 13:35:11 GMT
- Title: Combining EfficientNet and Vision Transformers for Video Deepfake
Detection
- Authors: Davide Coccomini, Nicola Messina, Claudio Gennaro and Fabrizio Falchi
- Abstract summary: Deepfakes are the result of digital manipulation to obtain credible videos in order to deceive the viewer.
In this study, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor.
The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC)
- Score: 6.365889364810238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deepfakes are the result of digital manipulation to obtain credible videos in
order to deceive the viewer. This is done through deep learning techniques
based on autoencoders or GANs that become more accessible and accurate year
after year, resulting in fake videos that are very difficult to distinguish
from real ones. Traditionally, CNN networks have been used to perform deepfake
detection, with the best results obtained using methods based on EfficientNet
B7. In this study, we combine various types of Vision Transformers with a
convolutional EfficientNet B0 used as a feature extractor, obtaining comparable
results with some very recent methods that use Vision Transformers. Differently
from the state-of-the-art approaches, we use neither distillation nor ensemble
methods. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very
close to the state-of-the-art on the DeepFake Detection Challenge (DFDC).
Related papers
- Deepfake detection in videos with multiple faces using geometric-fakeness features [79.16635054977068]
Deepfakes of victims or public figures can be used by fraudsters for blackmailing, extorsion and financial fraud.
In our research we propose to use geometric-fakeness features (GFF) that characterize a dynamic degree of a face presence in a video.
We employ our approach to analyze videos with multiple faces that are simultaneously present in a video.
arXiv Detail & Related papers (2024-10-10T13:10:34Z) - Unmasking Deepfake Faces from Videos Using An Explainable Cost-Sensitive
Deep Learning Approach [0.0]
Deepfake technology is widely used, which has led to serious worries about the authenticity of digital media.
This study employs a resource-effective and transparent cost-sensitive deep learning method to effectively detect deepfake faces in videos.
arXiv Detail & Related papers (2023-12-17T14:57:10Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - Deepfake Video Detection Using Generative Convolutional Vision
Transformer [3.8297637120486496]
We propose a Generative Convolutional Vision Transformer (GenConViT) for deepfake video detection.
Our model combines ConvNeXt and Swin Transformer models for feature extraction.
By learning from the visual artifacts and latent data distribution, GenConViT achieves improved performance in detecting a wide range of deepfake videos.
arXiv Detail & Related papers (2023-07-13T19:27:40Z) - Voice-Face Homogeneity Tells Deepfake [56.334968246631725]
Existing detection approaches contribute to exploring the specific artifacts in deepfake videos.
We propose to perform the deepfake detection from an unexplored voice-face matching view.
Our model obtains significantly improved performance as compared to other state-of-the-art competitors.
arXiv Detail & Related papers (2022-03-04T09:08:50Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Deepfake Detection Scheme Based on Vision Transformer and Distillation [4.716110829725784]
We propose a Vision Transformer model with distillation methodology for detecting fake videos.
We verify that the proposed scheme with patch embedding as input outperforms the state-of-the-art using the combined CNN features.
arXiv Detail & Related papers (2021-04-03T09:13:05Z) - Deepfake Video Detection Using Convolutional Vision Transformer [0.0]
Deep learning techniques can generate and synthesis hyper-realistic videos known as Deepfakes.
Deepfakes pose a looming threat to everyone if used for harmful purposes such as identity theft, phishing, and scam.
We propose a Convolutional Vision Transformer for the detection of Deepfakes.
arXiv Detail & Related papers (2021-02-22T15:56:05Z) - Adversarially robust deepfake media detection using fused convolutional
neural network predictions [79.00202519223662]
Current deepfake detection systems struggle against unseen data.
We employ three different deep Convolutional Neural Network (CNN) models to classify fake and real images extracted from videos.
The proposed technique outperforms state-of-the-art models with 96.5% accuracy.
arXiv Detail & Related papers (2021-02-11T11:28:00Z) - Emotions Don't Lie: An Audio-Visual Deepfake Detection Method Using
Affective Cues [75.1731999380562]
We present a learning-based method for detecting real and fake deepfake multimedia content.
We extract and analyze the similarity between the two audio and visual modalities from within the same video.
We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets.
arXiv Detail & Related papers (2020-03-14T22:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.